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ABSTRACT: We present a model of similarity-based retrieval that attempts to capture 
three seemingly contradictory psychological phenomena: (1) structural commonalities are 
weighed more heavily than surface commonalities in similarity judj^-nents for items in 
working memory; (2) in retrieval, superficial similarity is more important than structural 
similarity; and yet (3) purely structural (analogical) remindings are sometimes 
experienced. Our model, MAC/FAC, explains these phenomena in terms of a two-stage 
process. The first stage uses a computationally cheap, non-structural matcher to filter 
candidate long-term memory items. It uses content vectors, a redundant encoding of 
structured representations whose dot product estimates how well the corresponding 
structural representations will match. The second stage uses SME to compute structural 
matches on the handful of items found by the first stage. We show the utility of the 
MAC/FAC model through a series of computational experiments: (1) We demonstrate 
that MAC/FAC can model patterns of access found in psychological data, (2) We argue 
via sensitivity analyses that these simulation results rely on the theory, and (3) We 
compare the performance of MAC/FAC with ARCS, an alternate model of similarity- 
based retrieval, and demonstrate that MAC/FAC explains the data better than ARCS. 
Finally, we discuss limitations and possible extensions of the model, relationships with 
other recent retrieval models, and place MACTAC in the context of other recent work on 
the nature of similarity. 
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1. Introduction 



Similarity-based remindings range from the sublime to the stupid. At one extreme, seeing the periodic table 
of elements reminds one of octaves in music. At the other, a bicycle reminds one of ai pair of eyeglasses. 
Often, remindings are neither brilliant nor superficial but simply mundane, as when a bicycle reminds one 
of another bicycle. Theoretical attention is inevitably drawn to spontaneous analogy: i.e., to structural 
similarity unsupported by surface similarity, as in the octave/periodic table comparison. Such remindings 
seem clearly insightful and seem linked to the creative process, and should be included in any model of 
retrieval. But as we review below, research on tlie psychology of memory retrieval points to a 
preponderance of the latter two types of similarity (mundane) literal similarity, based on both structural 
and superficial commonalities — and (dumb) superficial similarity, based on surface commonalities. A 
major challenge for research on similarity-based reminding is to devise a model that will produce chiefly 
literal-similarity and superficial remindings, but still produce occasional analogical remindmgs. 

A further constraint on models of access comes from considering the role of similarity in transfer and 
inference. The large number of superficial remindings indicates that retrieval is not very sensitive to 
structural soundness. But appropriate transfer requires structural soundness, so that knowledge can be 
exported from one description into another. And psychological evidence (also discussed below) indicates 
that the mapping process involved in transfer is actually very sensitive to structural soundness. Hence our 
memories often give us information we don't want, which at first seems somewhat paradoxical. Any model 
of retrieval should explain this paradox. 

This paper presents MACTFAC, a model of similarity-based reminding that attempts to capture these 
phenomena. MACTFAC models similarity-based retrieval as a two-stage process. The first stage (MAC) 
uses a cheap, non-structural matcher to quickly filter potentially relevant items from a pool of such items. 
These potential matches are then processed in the FAC stage by a more powerful (but more expensive) 
structural matcher, based on the structure-mapping notion of literal similarity (Gentner, 1983). 

We begin in Section 2 by briefly reviewing psychological evidence on similarity-based retrieval and 
mapping, thereby extracting some criteria which retrieval models must satisfy. This section also outlmes 
the computational issues raised by similarity-based retrieval, drawing on the AI literature as necessary. 
Section 3 describes ixe MACTFAC model, showing how it satisfies the psychological and computational 
desiderata. 4 illustrates the model's psychological plausibility by simulating the results of a psychological 
experiment. Section 5 explores the consequences of different di^sign decisions by sensitivity analyses at the 
level of algorithms, demonstrating that the model's performance depends on the theoretically important 
parameters. Section 6 compares MAC7FAC with ARCS, the closest competing model of similarity-based 
retrieval, demonstrating that MAC/FAC performs well on databases designed by others (e.g., the ARCS 
datasets) and that MAC/FAC s performance fits the psyc hological evidence better than ARCS. Finally, 
Section 7 compares MAC/FAC to several other memory models, analyzes some of its limitations, and 
discusses possible extensions. 

2. Framework 

Similarity-based transfer can be decomposed into subprocesses. Given that a person has some current 
target situation in working memory, transfer from prior knowledge requires at least 
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1 . Accessing a similar {base) situation in long-term memory, 

2. Creating a mapping from the base to the target, and 

3 . Evaluating the mapping. 

In this case the base is an item from memory, and the target is the probe; that is, we think of the retrieved 
memory items as mapped to the probe. Other processes may also occur - verif ying new inferences about 
the target (Clement, 1986), elaborating the base and target (Ross, 1987; Falkenhainer, 1988), adapting or 
tweaking the domain representations to improve the match (Falkenhainer, 1990; Holyoak & Novick, in 
press; Kass, 1986, 1989), and abstracting the common structure from base and target (Gick & Holyoak, 
1983; Skorstad, Gentner & Medin, 1988; Winston, 1982) - but our focus is on the first three processes. 

2. 1 Structure-'Mapping and the typology of similarity 

The process of mapping aligns two representations and uses this alignment to generate analogical 
inferences (Gentner, 1983, 1988, 1989a). Alignment occurs via matching, which creates correspondences 
between items in the two representations. Analogical inferences are generated by using the 
correspondences to import knowledge from the base representation into the target. The mapping process is 
assumed to be governed by the constraints of structural consistency: one-to-one mapping and parallel 
connectivity. One-to-one mapping means that an interpretation of a comparison cannot align (e.g., place 
into correspondence) the same item in the base with multiple items in the target, or vice- versa. Parallel 
connectivity means that if an interpretation of a comparison aligns two statements, their arguments must 
also be placed into correspondence.' In this account, similarity is defined in terms of correspondences 
between structured representations (Gentner 1983; Gentner & Markman, 1993, 1994, in press; Goldstone, 
Medin & Gentner, 1991; Markman & Gentner, 1990,1993a,b; Medin, Goldstone & Gentner, 1993). 
Matches can be distinguished according to the kinds of commonalities present. An analogy is a match 
based on a common system of relations, especially involving higher-order relations.^ A literal similarity 
match includes both common relational structure and common object descriptions. A surface similarity or 
mere-appearance match is based primarily on common object descriptions, with perhaps a few shared 
first-order relations. 

There is considerable evidence that the mapping process is sensitive to structural commonalities. People 
can readily align two situations, preserving structurally important commonalties, making the appropriate 
lower-order substitutions, and mapping additional predicates into the target as candidate inferences. For 
example, Clement & Gentner (1991) showed people analogies and asked which of two lower-order 
assertions, both shared by base and target, was most important to the match. Subjects chose assertions that 
were connected to matching causal antecedents: that is, their choice was based not only on the goodness of 
the local match but on whether it was connected to a larger matching system. In a second study, subjects 
were asked to make a new prediction about the target based on the analogy with the base story. They again 
showed sensitivity to connectivity and systematicity in choosing which predicates to map as candidate 
inferences from base to target. Evidence for structural consistency in mapping comes from a study by 
Spellman and Holyoak (1992). They asked people to explicate the analogy between the Gulf War and 
Worid War H, assuming Saddam Hussein maps onto Hitler. Although people were divided in their 
mappings, they were highly consistent. People who mapped Bush onto Churchill mapped the current USA 
onto WWII Britain and people who mapped Bush onto FDR mapped the USA today onto USA during 
Worid War H. 



^ Previously we used the term structurally grounded for parallel connectivity. 

^ We define the order of an item in a representation as follows: Objects and constants are order 0. The 

order of a statement is one plus the maximum of the order of its arguments. 
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The degree of relational match is also important in determining people's evaluations of comparisons. 
People rate metaphors as more apt when they are based on relational commonalities than when they are 
based on common object-descriptions (Centner, 1988b; Centner & Clement, 1988). Centner, Rattermann 
and Forbus (1993) asked subjects to rate the soundness and similarity of story pairs that varied in which 
kinds of commonalities they shared. Subjects* soundness and similarity ratings were substantially greater 
for pairs that shared higher-order relational structure than for those that did not (Centner & Landers, 1985; 
Centner, Rattermann, & Forbus, in press; Rattermann & Centner, 1987). Common relational structure 
also contributes strongly to judgments of perceptual similarity (Coldstone, Medin & Centner, 1991) as well 
as to the way in which people align pairs of pictures in a mapping task (M:\rkman & Centner, 1990, 
1993b) and determine common and distinctive features (Centner & Markman, 1994; Markman & Centner, 
1993a). 

Any modei of human similarity and analogy must capture this sensitivity to structural commonality. To do 
so, it must involve structural representations and processes that operate to align them (Bamden, 1994; 
Centner & Markman, in press; Coldstone, Medin & Centner, 1991; Holy oak, Novick, & Melz, 1994; 
Keane, 1988ab; Markman & Centner, 1993; Medin, Coldstone, & Centner, 1993; Reed, 1987; Reeves & 
Weisberg, 1994). This would seem to require abandoning some highly influential models of similarity: e.g., 
modeling similarity as the intersection of independent feature sets or as the dot product of feature vectors. 
However, we show below that a variant of these nonstructural models can be useful in describing memory 
retrieval. 

2.1.1 Similarity-based Access from Long-term Memory 

There is considerable evidence that access to long-term memory relies more on surface commonalities and 
less on structural conmionalities than does mapping. For example, people often fail to access potentially 
useful analogs, as in Gick and Holyoak's (1980, 1983) dramatic demonstration. When subjects were told a 
story and then given an analogous problem to solve, about 30% solved the problem. However, if subjects 
were simply told to think about the story they had heard, 80% ~ 90% solved the problem. We can infer 
that most of the subjects retained representations of the prior story sufficient to provide a useful analogy, 
but that hearing the structurally analogous problem did not provide spontaneous access to the story 
representation in memory. Other research has shown that, although people in a problem-solving task are 
often reminded of prior problems, these remindings are often based on surface similarity rather than on 
structural similarities between the solution principles (Holyoak & Koh, 1987; Keane, 1987, 1988b; Novick 
1988; Reed, Ernst, & Banerji, 1974; Ross, 1984, 1987, 1989; see also the comprehensive review by 
Reeves & Weisberg, 1994). 

The experiments we will model here investigated which kinds of similarities led to the best retrieval from 
long-t^rm memory (Gentner & Landers, 1985; Gentner, Rattermann, & Forbus, 1993; Rattermann & 
Gentner, 1987). Subjects were first given a relatively large memory set (the "Karla the hawk" stories). 
About a week later, subjects were given new stories that resembled the original stories in various ways and 
were asked to write out any remindings they experienced to the prior stories while reading the new stories. 
Finally, they rated all the pairs for soundness - i.e., how well inferences could be carried from one story to 
the other. The results showed a marked disassociation between retrieval and subjective soundness and 
similarity. Surface similarity was the best predictor of memory access and structural similarity was the 
best predictor of subjective soundness. This dissociation held not only between subjects, but also within 
subjects. Tliat is, subjects given the soundness task immediately after the cued retrieval task judged that 
the very matches that had come to their minds most easily (the surface matches) were highly unsound (i.e., 
unlikely to be useful in inference). This suggests that similarity-based access may be based on qualitatively 
distinct processes from analogical infercncing. 



It is not the case that higher-order relations contribute nothing to retrieval. Adding higher-order relations 
led to nonsignificantly more retrieval in two studies and to a small but significant benefit in the third. Other 
research has shown positive effects of higher-order relational matches on retrieval, especially in cases 
where subjects have been brought to do intensive encoding of the original materials (Faries & Reiser, 
1988) or are expert in the domain (Novick, 1988a,b). But higher-order commonalities have a much bigger 
effect on mapping once the two analogs are present than they do on similarity-based retrieval, and the 
reverse is true for surface commonalities. 

These results place several constraints on a computational model similarity-based retrieval. The first two 
criteria ensure that the model can provide an account of mapping and inference. 

Structured representation criterion: The model must be able to store structured representations. 
Structured mappings criterion: The model must incorporate processes of structural mapping (i.e., 
alignment and transfer) over its representations. 



The remaining four criteria summarize the pattern of retrieval results: 

Primacy of the mundane criterion: The majority of retrievals should be literal similarity matches: i.e., 

matches high in both structuial and surface commonlaties. 

Surface superiority criterion: Retrievals based on surface similarity are frequent. 

Rare insights criterion: Relational remindings must occur at least occasionally, with lower frequency than 

literal similarity or surface remindings. 

Scalability criterion: The model must be plausibly capable of being extended to large memory sizes. 

No current model of transfer succeeds in satisfying all six criteria. There are two major approaches to 
memory models: Indexing models, commor iy used in case-based reasoning work, and feature-vector 
models, commonly used in mathematical modeling of human memory. We examine the trade-offs of each in 
turn. 

Most case-based reasoning models (Kass, 1986, 1989; Kolodner, 1984, 1988, 1989, 1993;Schank, 1982) 
use structured representations, and focus on the process of adapting and applying old cases to new 
situations. Such models satisfy the structured representation and structured mappings criteria. However, 
such models also typically presume a highly indexed memory in which the vocabulary used for indexing 
captures significant higher-order abstractions such as themes and principles. Viewed as psychological 
accounts, these models would predict that people should typically access the best structural match. That 
prediction fails to match the pattern of psychological results summarized by the primacy of the mundane 
and surface superiority criteria. Scalability is also an open question at this time, since no one has yet 
accumulated and indexed a large (1,000 to 10^ corpus of structured representations. 

The reverse set of advantages and disadvantages holds for approaches that model similarity as the result of 
a dot product (or some other simple operation) over feature vectors, as in many mathematical models of 
human memory (e.g., Gillund & Shiffrin, 1984; Hintzman, 1986, 1988; Medin & Schaffer, 1978; but see 
Murphy & Medin, 1985) as well as in many connectionist models of learning (e.g., Smolensky, 1988, see 
also the review by Humphreys, Bain & Pike (1989) and Ratcliff (1990).). These models typically utlize 
nonstructured knowledge representations and relatively simple match processes and hence do not allow for 
structural matching and inference. Such models also tend to utilize a unitary notion of similarity, an 
assumption that is called into question by the disassociation described above (see also Medin, Goldstone & 
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Centner, 1993; Markman & Gentner, 1993). However, the use of feature-vectors has some advantages for 
modeling access to long-term memory. The computations are simple enough to make it feasible to compute 
many matches and choose the best, thus satisfying the scalability criterion. Further, because object 
features are included in the feature vectors, these models should be able to capture the surface superiority 
criterion and in many cases the primacy of the mundane criterion. (Failures on the latter will occur for 
cross-mappings, when the objects and relations match but their bindings do not.) It should be noted that 
some case-based reasoning work also restricts itself to feature-vector representations and thus has the same 
strengths and weaknesses (e.g., Stanfill & Waltz, 1986). 

The MAC/FAC model seeks to combine the advantages of both approaches. We turn now to its 
description. 



3. The MAC/FAC model 

The complexity of the phenomena in similarity-based access suggests a two-stage model. Consider the 
computational constraints on access. The large number of cases in memory and the speed of human access 
suggests a computatioiiMIy cheap process. But the requirement of judging soundness, essential to 
establishing whether a maich can yield useful results, suggests an expensive match process. A common 
computational solution to such problems is to use a two-stage process, in which a cheap filter is used to 
pick out a subset of likely candidates for more expensive processing (c.f. Bareiss & King, 1989). 
MAC/FAC uses this strategy. The disassociation noted previously can be understood in terms of the 
interactions of its two stages. 

Figure 1 illustrates the components of the MAC/FAC model. The inputs are a pool of memory items and a 
probe, i.e., a description for which a match is to be found. The output is an item from memory (i.e., a 
structured description) and a comparison of this item with the probe. (Section 3.1 below describes exactly 
what a comparison is.) Internally thore are two stages. The MAC stage provides a cheap but non- 
structural filter, which only passes on a handful of items. The FAC stage uses a more expensive but more 
accurate structural match, to select the most similar item(s) from the MAC output and producing a full 
structural alignment. Each stage consists of matchers, which are applied to every input description, and a 
selector, which uses the evaluation of the matchers to select which comparisons are produced as the output 
of that stage. Conceptually, matchers are applied in parallel within each stage. 

We make minimal assumptions concerning the global structure of long-term memory. We assume here 
only that there is a large pool of descriptions from which we must select one or a few that are most similar 
to a probe. We are uncommitted as to whether the pool is the whole of long-term memorj' or a subset 
selected via some other method, e.g., spreading activation. 
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Figure 1: The MAC/FAC model 



We begin by describing the FAC stage. In doing so, we also describe the computational framework which 
underlies MAC and FAC, including our conventions for representation and the information about the SME 
algorithm that is required to fully understand MAC/FAC. 

3.7 Tht FAC stage and SME 

The FAC stage takes as input the descriptions selected by the MAC stage, and computes a full structural 
match between each item and the probe. We model the FAC stage by using SME, the Structure-Mapping 
Engine (Falkenhainer, Forbes & Gentner, 1986, 1989). Here we briefly summarize SME's operation, both 
by way of describing the FAC stage and to provide the vocabulary needed to motivate the design of the 
MAC stage. 

SME is an analogical matcher designed as a simulation of structure-mapping theory. It takes two inputs, a 
base description and a target description. (For simplicity we speak of these descriptions as being made up 
of items, meaning both objects and statements about these objects.) It computes a set of global 
interpretations of the comparison between base and target. Bach global interpretation includes 

• A set of correspondences which pair specific items in the base representation to specific items in the 
target. 

• A structural evaluation reflecting the estimated soundness of the match. In subsequent processing, the 
structural evaluation provides one source of information about how seriously to take the match. 

• A set of candidate inferences, potential new knowledge about the target which are suggested by the 
correspondences between the base and target. Candidate inferences are what give analogy its 
generative power, since they represent the importation of new knowledge into the target description. 
However, they are only conjectures; they must be tested and evaluated by other means. 



We can illustrate these ideas with the Rutherford analogy, which describes the structure of the atom in 
terms of that of the solar system. The solar system is the base description and the atom is the target 
description. 

• Rutherford paired the Sun to the nucleus and the planets to the electrons. These correspondences seem 
reasonable not because of intrinsic object similarities but because they allow various relational 
statements also to be placed in correspondence (i.e., aligned): e.g., the relative masses of the objects 
and the fact that the planets/electrons revolve around the Sun/nucleus. 

• This interpretation is a selection from among many common relations. It focuses on the causal system 
of a central gravitational/electromagentic force, the relative mass of the two bodies within each system, 
and the fact that the less massive body revolves around the heavier body. Othe, conunon relations ~ 
such as the relative temperatures or differences in color of the two objects — thc^ do not belong to a 
common connected system are not included in the interpretation. We refer to this preference for 
connected systems of common predicates as the systematicity principle. 

• The preferred interpretation might also sanction new conjectures about the atom, such as that the cause 
of the electrons revolving aiound the nucleus is the existence of an attractive force.^ 

The interpretations produced by SME are structurally consistent, in that they satisfy the constraints of one- 
to-one mapping and parallel connectivity, as defined in Section 2.1. These constraints are important 
because they allow for the generation of coherent candidate inferences. The systematicity constraint is 
important because it captures the human preference for aligning connected systems of predicates (e.g., 
logical arguments or causal sequences). In addition, SME attempts to find maximal interpretations. An 
interpretation is maximal if adding any additional correspondences would render it structurally inconsistent. 
Maximality is important both because it reduces the number of possible interpretations and because it 
ensures that the full structural implications of a set of correspondences will be considered. 

Before describing the SME algorithm further, some conventions concerning representation are in order. 
We use infix notation or Lisp prefix syntax for statements as appropriate. We use the term functor of a 
statement as a general term for the relation or function or connective that takes the remaining parts of the 
statement as its arguments. For example, 

1. In GREATER-THAN (HEIGHT (A) , HEIGHT (B) ) , the functor is GREATER-THAN. 

2. In NOT ( AKOVE ( B , A) ) , the fuDCtor is NOT. 

3. In HEIGHT (A) , the functor is height. 

4. In RED (A) , the functor is RED. 



Example # 1 is an example of a relation. Relations range over truth values, and their arguments can be 
entities or other statements. Relations always have multiple arguments, with the exception of logical 
connectives (e.g.. Example # 2), which are always treated as relations regardless of the number of 
arguments. For the purposes of structure-mapping, modal operators and other higher-order predicates are 
classifed as relations. Example # 3 is an example of a function^ which maps one or more entities into 
another entity or constant. In our psychological modeling, functions are often used to represent known 
dimensions or components of structured objects (e.g., height, pressure, or color). Example # 4 is an 



^ Incorrect candidate inferences are also possible - e.g. that the attractive force in the atom is gravity. 
What counts as a candidate inference versus as alignable (or non-alignable) structure depends on the 
reasoner's state of knowledge about the target. 
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example of an attribute, an atonnic description of some property of an entity. Attributes take only one 
argument, to capture the notion of a unitary description. This of course does not mean that attributes 
cannot be decomposed. For instance, the following forms are logically equivalent: 

• RED (A) 

• COLOR-OF(A, red) 

• COLOR (A) = red 

However, we use these three distinct forms to represent distinct psychological constructs. Roughly, the 
first, an attribute, indicates that the subject thinks of redness as a quality of the object. The second, a 
relation, indicates that the subject has to some degree disengaged redness from the object and sees color as 
a relationship between an object and a set of possible values. The third, a function, indicates that the 
subject conceives of color as a dimension of general application and thinks of the color of A as a value 
along this dimension. We view this kind of dimensional representation as important because dimensions 
may in the process of comparison be aligned with quite different dimensions (e.g., HEIGHT and 
DARKNESS). Thus qualities that are conceived of as dimensions are more likely to participate in 
systematic cross-dimensional matches. (For the implications of this idea in analogical development, see 
Gentner & Rattermann, 1990; Gentner, Rattermann, Markman & Kotovsky, in press; Kotovsky & 
Gentner, 1988.) 

With these conventions in mind, let us turn to the SME algorithm. SME operates via a local-to-global 
process. Conceptually, its operation can be divided into four phases. The first phase constructs a network 
of local matches between items in the base and target. The second phase constructs global interpretations 
by coalescing structurally consistent combinations of local matches. The third phase computes the 
structural evaluation, and the fourth phase computes candidate inferences for each interpretation. We 
examine each in turn. 

SME begins by finding all possible local matches between statements in the base and statements in the 
target. A local match is created between base item Bi and target item Tj when either: 

1 . Bj and Tj are both statements whose functors are sufficiently alike (typically identical, but see below). 

2. Bi and Tj are corresponding arguments of other statements which are connected by a local match and 
are both either objects or functions. 

For instance, given the base item BI and the target item Tl defined as: 

BI: {CAUSE Eventl? Event31) 
Tl: (CAUSE Events Event 63) 

a match would be hypothesized between bi and n because their functors (i.e., cause) are identical. This 
local match suggests in turn hypothesizing that Eventi? and Events match, and also that Event3i and 
Event 6 3 match. Each suggested match leads to the creation of new local matches involving the arguments of 
the statement if either (a) both are entities (e.g., objects or constants), (b) both are terms involving 
functions, which are an indirect means of referring to entities or dimensions, or (c) both are expressions 
whose functors match. Here is an example of substitution involving functions: 

B2: (PRESSURE Waters 2) 
T2: (TEMPERA': URE Brick45) 



B2 and T2 could be placed into correspondence if they were the arguments of some other matching pair of 
statements since pressure and temperature are both functions (in this case referring to values on physical 
dimensions of the respective objects). 



The idea that two statements can match only if their relational predicates are "sufficiently alike" is based on 
the claim that some common relational content is required in analogy. We disagree with Holy oak and 
Thagard's (1989) claim that pure structural isomorphisms can qualify as analogies. They present the 
following pair: 



Holy oak and Thagard (1989, p. 343) note that ACME (and five out of the eight subjects tested) could 
match this pair and agree on the best attribute correspondences. But the fact that it can be solved is not 
decisive: we would suggest that it is taken as a logical puzzle to be solved for the best correspondences, not 
as an analogy. The trouble with accepting pure graph matches is that it leads to the claim that pairs like (1) 
and (2) below are analogies, which seems patently untrue: 



(1) Fred loves New York. (2) General Motors sells cars. 

Note that it is the relational meaning that must be shared; (2) and (3) form an analogy but (2) and (4) do 
not: 

(3) Fred peddles popsicles. (4) General Motors heads the list. 

The question is how to formalize this requirement of common relational content. Structure-mapping uses 
the idea of tiered identicality. The default criterion for "sufficiently alike" for predicates other than 
functions is that the predicates are identical. We call this the simple identicality criterion. Simple 
identicality of conceptual relations is an excellent first-pass criterion because it is computationally cheap. 
The notion of smiple identicality might suggest an inability to process any matches other than literal 
matches. This is not the case. First, we assume that input representations are canonical conceptual 
representations, not semi-verbal strings. Second, functions, which represent domain dimensions, can be 
matched non-identically if they are embedded in matching relational structure. This ability to align non- 
identical functions provides considerable flexibility. This is what allows SME to make cross-dinwnsional 
matches, as when we interpret "Sally is sharper than Bill" to mean that Sally is smarter than Bill. 
However, there are circumstances where criteria requiring more processing are worthwhile (e.g., when 
placing two items in correspondence would allow a larger, or very relevant, structure to be mapped, as in 
Falkenhainer's (1987, 1990) work). In these circumstances weaker criteria (in that they allow more items 
to match) that involve more processing are allowed. One such test is minimal ascension (Falkenhainer, 
1987, 1990) which allows two items to be placed into correspondence if their predicates have close 
common superordinates. Another technique is decomposition: Two concepts :.iat are similar but not 
identical (such as "bestow" and "bequeath") are decomposed into a canonical representation language so 



Bill is smart and tall. 
Steve is smart. 
Tom '.s timid and tall. 



Rover is hungry and friendly. 

Fido is hungry. 

Blackie is frisky and friendly. 
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that their similarity is expressed as a partial identity (here, roughly "give"). Decomposition is the simplest 
form of r^^representation (Gentner, 1989; Gentner & Rattermann, 1991), where additional knowledge is 
used to reformulate a description in order to achieve a better match. In this paper we only use SME with 
th . /irst-level identicality constraint. As Section 6 argues, this simple constraint seems to provide a better 
psychological account than more complex constraints do. 

The process of using matches to propose lower matches is recursive, ending with entity matches. SME 
does not try matches between every pair of objects in base and target: It only hypothesizes object matches 
when there is some aspect of the relational structure which suggests that the objects might correspond. 
This leads to substantial efficiencies over purely bottom-up matchers, such as (Winston 1980). 

The output of the first phase is a network of match hypotheses, each representing a local match between an 
item of the base and target. At this stage the network is incoherent. The set of correspondences taken as a 
whole is structurally inconsistent, often including N-to-one mappings. Further, this initial network may 
contain match hypotheses that are not grounded and so can never be part of any global interpretation. A 
match hypothesis is grounded if a recursive chain of correspondences from it through its arguments exists 
all the way down to entities. Only grounded match hypotheses can participate in global interpretations. 
Otherwise, global interpretations might include statements whose arguments did not match, which would 
violate the parallel connectivity constraint. 

Looking at a simple example makes this process clearer. Figure 2 shows two drawings used in 
psychological experiments concerning analogy,'* with a propositional representation of these pictures 
suitable for simulation shown in Figure 3. The right hand side of Figure 3 shows the propositions in 
standard logical format, while the left hand side contains an equivalent graphical representation which is 
useful for understanding the match process. Figure 4 illustrates the match hypotheses computed by SME 
for these descriptions. 



'^We thank Arthur Markman for the drawings of Figure 2 and the corresponding representations. 
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Jack's Robot Repair Service 
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Figure 2: Two simple situations 
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Even though the initial network of match hypothesis is structurally inconsistent, it contains every consistent 
interpretation of the match; global interpretations emerge out of the initial network. Thus the maximum size 
of any global interpretation, as measured in number of correspondences, is limited by the size of this 
network. We exploit this fact in Section 3,3. 



Jack's Robot Repair description: 

(REASON (REPAIRING ROBOTJ CAR54) 

(BROKEN CAR54)) 
(CAUSE (TYPING-AT JACK 

(KEYBOARD ROBOTJ)) 

(REPAIRING ROBOTJ CAR54)) 
(CAUSE (REPAIRING ROBOTJ CARS4) 

(USING ROBOTJ HANDTOOLSJ)) 
(DOOR DOORJ) 
(JOINTED ROBOTJ) 
(METALLIC ROBOTJ) 
(ROBOT ROBOTJ) 
(PERSON JACK) 



Grant's Robot Repair description: 

(REASON (REPAIRING GRANT ROBOTG) 

(BROKEN ROEiOTG)) 
(CAUSE (REPAIRING GRANT ROBOTG) 
(USING GRANT HANDTOOLSG)) 
(TOOL-SET HANDTOOLiiG) 
(DOOR DOORG) 
(JOINTED ROBOTG) 
(IMETALLIC ROBOTG) 
(ROBOT ROBOTG) 
(PERSON GRANT) 



Here are two predicate calculus descriptions given to SME to illustrate the algorithm's operation. 
Figure 3: Sample Descriptions 



In the second phase, these local matches are coalesced into global interpretations. The SME algorithm 
combines structurally consistent combinations of match hypotheses (i.e., sets with consistent object 
bindings and consistent relational argument assignments). For instance, in Figure 4 there are two match 
hypotheses involving Grant, one which places him in correspondence with Jack because person is true of 
both of them, and another match hypothesis which places Grant in correspondence with Robotj, because 
both are agents of the same kind of action, repairing. No interpretation of this comparison can include 
both of these match hypotheses. Merging can be done exhaustively, producing all possible interpretations 
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(as in Falkenhainer, Forbus & Gentner 1986. 1989); however, we normally use a more psychologically 
plausible greedy merge algorithm, which produces only one or two interpretations and operates in linear 
time (Forbus & Oblinger 1990; Forbus, Ferguson, & Gentner, 1994). 




This picture illustrates the match hypotheses generated for a pair of simple descriptions. Match hypotheses 
are shown as triangles. Dashed lines indicate the base and target items each match hypothesis places in 
correspondence. The solid arrows leaving a match hypothesis indicates what others it relies upon to be 
structurally consistent. Notice that the one of the match hypotheses involving the occurrence of CAUSE in the 
target is structurally inconsistent, because its arguments cannot be aligned. 

Figure 4: A match hypothesis forest 
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The third phase :^ structural evaluation. For simplicity, we describe this stage as conceptually distinct 
fronn the previous stage, although it is actually interleaved with building interpretations, since its results 
guide the greedy merge algorithm. To capture human preferences, the structural evaluation computation 
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should favor interpretations with many matches over those with few matches and deep interpretations over 
shallow interpretations. The first step is to assign an initial score to every match hypothesis. This helps 
enforce the size preference. The systematicity preference is implemented via a trickle-down method: match 
hypothesis scores are passed down to increment the scores of matching arguments.^ That is, if W (MHi ) is 
the score associated with a match hypothesis MHi, MH2 is a match hypothesis that applies to one of MHi's 
arguments, and 5 is the trickle-down factor, then W (MH2 ) is incremented as follows: 

W(MH2) <-max{W(MH2) +6W(MHi) ; 1.0} 

This local computation causes scores to cascade downwards, providing higher values to those object 
correspondences which support the alignment of large relational structures.. The structural evaluation of a 
global interpretation is simply the sum of the scores of the match hypotheses which comprise its 
correspondences. 



GMl: 10 correspondences, SES = 4.66 
Object mappings: 

DOORG <-> DOOR J 

ROBOTG <~> CAR54 

GRANT <-> ROBOT J 

HANDTOOLSG <-> HANDTOOLSJ 
No candidate inferences . 

Here is a summary of the best interpretation for this match found by SME. SES refers to the structural 
evaluation of the interpretation. 

Figure 5: Global interpretations for the example 



The final phase is the computation of candidate inferences. Computing candidate inferences requires 
knowing the set of correspondences, so this takes place after the merge operation. Candidate inferences are 
generated by finding non-corresponding relational structure in the base which can be conjectured to hold in 
the target. The global interpretations built for the comparison of Figure 3 are shown in Figure 5. In this 
simple example there are no candidate inferences. 

It is important to note that the literal similarity computation can produce purely relational interpretations as 
well as overall similarity interpretations, and that it can also produce purely surface interpretations as well. 
It is simply a question of which collection of local matches win. This reflects the human ability to process 
a novel comparison and discover only after the fact that it is an analogy. We assume that this all-purpose 
literal similarity mode is the normal mode of similarity processing in the absence of specific instructions. 
Consequently, SME creates initial local matches for attribute statements as well as for relational 
statements. 



^The systematicity preference could have been implemented by differentially weighting matches at different 
levels. This method would seem to require a computationally implausible 'bird's eye' view of the 
representations. In a comparison of the two methods, the trickle-down method accounted for human 
soundness ratings better than treating weights directly as a function of order (Forbus & Centner, 1989). 



For SME to play a major role in a model of similarity-based retrieval, it should be consistent with 
psychological evidence. We have tested the psychological validity of SME as a simulation of analogical 
processing in several ways. For instance, we compared SME's structural evaluation scores v/ith human 
soundness ratings for ihe Karla the hawk stories discussed below (Gentner & Landers, 1985; Rattermann 
& Gentner, 1987) Like human subjects, SME rated analogical matches higher than surface matches 
(Skorstad, Falkenhainer, and Gentner, 1987). The patterns of preference were similar across story sets: 
there was a significant positive correlation between the difference scores for SME and thos3 for human 
subjects, where the difference score is the rating for analogy minus the rating for surface match within a 
given story set (Gentner, Rattermann & Forbus, 1993). 

Since retrievals occur frequently, components in model of retrieval must be efficient. SME is quite 
efficient. The generation of match hypotheses is O(n^) on a serial machine, where n is the number of items 
in base or target, and should typically be better than 0(log(n)) on data-parallel machines.^ The generation 
of global interpretations is roughly 0(log(n)) on a serial machine, using the greedy merge algorithm of 
Forbus & Oblinger (1990),^ and even faster parallel merge algorithms seem feasible. 

3.2 The FAC stage 

The FAC stage is essentially a bank of SME matchers, all running in parallel in literal similarity mode.® 
These take as input the memory descriptions that are passed forward by the MAC stage and compute a 
structural alignment between each of these descriptions and the probe. The other component of the FAC 
stage is a selector ~ currently a numerical threshold — which chooses some subset of these comparisons to 
be available as the output of the retrieval system. (See Figure 1). 

The FAC stage acts as a structural filter. It captures the human sensitivity to structural alignment and 
inferential potential (subject to the limited and possibly surface-heavy set of candidates provided by the 
MAC stage, as described below). Several remarks on this algorithm's role in retrieval are in order. We use 
the literal similarity algorithm, on the grounds that in reminding situations people can respond to and 
identify different kinds of sinularity. (Recall that the literal similarity computation can compute relational 
similarity or object sinularity as well as overall similarity). This choice seems ecologically sound because 
mundane matches are often reasonable guides to action; riding a new bicycle, for instance, is like riding 
other bicycles (Forbus & Gentner, 1986; Gentner, 1989; Medin & Ortony, 1989; Medin & Ross, 1989). 
Finally, this choice is necessary to model the high observed frequency of surface remindings. These surface 
remindings would mostly be rejected if FAC were strictly an analogy matcher. The selector for the FAC 
stage must choose a small set of matches for subsequent processing. Currently we select as output the best 
match, based on its structural evaluation, and any others within 10% of it. We settled on the 10% criteria 
because it generally returns a single result, only producing multiple results when there are two extremely 
close candidates. However, other criteria are possible and we have experimented with broadening the 
percentage, selecting a fixed number, selecting a maximum number (if capacity limits were assumed) and 
so forth. (One class of these experiments is described in Section 5.) We have also considered adding a 
threshold to the selector, so that if the best outcome is too weak the retrieval system returns nothing. 



^The worst case parallel time would be 0(n), in degenerate cases where all but one of the local matches is 
proposed by matching arguments. 

^The original exhaustive merge algorithm was worst case factorial in the number of "clumps" of match 
hypotheses, but in practice was often quite efficient. See Falkenhainer, Forbus, & Gentner (1989) for 
details. 

^In our current implementation SME is run sequentially on each candidate item in turn, but this is an 
artifact of the implementation. 
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3.3 The MAC stage 

The MAC stage collects the initial set of matches between the probe and memory. Like the FAC stage, the 
MAC stage conceptually consists of a set of matchers and a selector that simply returns all items whose 
MAC score is within 10 % of the best score given that probe. The challenge of the MAC stage is in the 
design of its matcher. It must allow quickly comparing, in parallel, the probe to a large pool of descriptions 
and passing only a few on tc the more expensive FAC stage. The rest of this section describes the design 
and implementation of the MAC matcher. 

Let us start by examining in more detail the design criteria the MAC matcher must satisfy. Ideally, we 
would like the most similar or apt memory item for the given probe. Clearly running SME on the probe 
and every item in memory would provide the most accurate result. Unfortunately, even though SME is very 
efficient, it isn't efficient enough. SME operates by building intermediate structure, in the form of the 
network of local matches. The idea of building such networks for a pair of items, or a small number of 
pairs of items, is psychologically plausible, because the size of the match hypothesis network is polynomial 
in the size of the descriptions being matched. This means, depending on one's implementation assumptions, 
that a fixed-size piece of hardware could be built which could be dynamically reconfigured to represent any 
local match network for input descriptions of some bounded size. What is not plausible is that such 
networks cowld be built between a probe and every item in a large memory pool, and especially that this 
could happen quickly enough in neural architectures to account for observed retrieval times. 

This architectural argument suggests that, while SME in literal sinnalarity mode is fine for FAC, MAC must 
be made of simpler stuff. To escape having to suffer the complexity of the most accurate matcher in the 
"innermost loop" of retrieval, we must trade accuracy for efficiency. The MAC matcher must provide a 
crude, computationally cheap match process to pare down the vast set of memory items into a small set of 
candidates for more expensive processing. Ideally, MAC's computations should be simple enough to admit 
plausible parallel and/or connectionist implementations for large-scale memory pools. 

What is the appropriate crude estimator of similarity? The most straightforward method would be to count 
the number of match hypotheses that FAC would generate in comparing a probe to a memory item. Let us 
call this number the numerosity of a comparison. Numerosity bears a rough relation to the potential size of 
the global interpretation, since the more local matches there are, the lar^^-^r the global interpretation could 
potentially be. However, a large number of match hypotheses does not guarantee a large global 
interpretation, for two reasons. First, many match hypotheses might be ungrounded (recall Section 3.1) 
and hence cannot be part of any global interpretation. Second, often many combinations of match 
hypotheses are ruled out by the 1: 1 constraint, preventing the formation of large global interpretations. 
Both reasons follow directly from the fact that numerosity is not structurally sensitive. However, 
something like numerosity is at least a cmde estimate of similarity. 

One straightforward way to implement a rough similarity estimator would be to calculate numerosity by 
building the actual match hypothesis network (e.g., to carr>' out the first part of a full analogy process) for 
the probe and each memory item, and then count the match hypotheses. This is what our original version 
of MAC/FAC did (Centner, 1989b). It also is roughly what ARCS (Thagard et al 1990) does. ARCS 
models retrieval by building a network of connections similar to SME's match hypothesis network between 
the probe and each item in the memory pool that shares a semantically similar predicate with it.^ As just 
discussed, we view these solutions as psychologically and computationally implausible. Even with parallel 

^ ARCS is based on Holyoak & Thagard's (1989) ACME, an analogy matcher which uses a localist 
connectionist network similar to SMEs match hypothesis network to construct a single interpretation of a 
comparison via constraint satisfaction. 



and/or neural hardware, it is hard to see how to generate match hypothesis networks between a probe and 
everything in a large pool of memory, while still providing realistic response times. A cheaper method is 
required. 

We have developed a novel technique for estimating the degree of match in which structured 
representations are encoded as content vectors. Content vectors are flat sununaries of the knowledge 
contained in complex relational structures. The content vector for a given description specifies which 
functors (i.e., relations, connectives, object attributes, functions, etc.) were used in that description and the 
number of times they occurred. Content vectors are assumed to arise automatically from structured 
representations and to remain associated with them. Content vectors are a special form of feature vectors. 

More precisely, let 11 be the set of functors used in the descriptions that constitute memory items and 
probes. We define the content vector of a structured description as follows. A content vector is an /irtuple 
of numbers, each component corresponding to a particular element of FI. Given a description <(), the value 
of each component of its content vector indicates how many times the corresponding element of FI occurs in 
(j). Components corresponding to elements of FI which do not appear in statements of <() have the value zero. 
One simple algorithm for computing content vectors is to simply to count the number of occurrences of 
each functor in the description. Thus if there were four occurrences of implies in a story, the value for the 
IMPLIES component of its content vector would be four. (Figure 6 illustrates.) Thus content vectors are easy 
to compute from a structured representation and can be stored economically (using sparse encoding, for 
instance, on serial machines). 



Solar System: Structured representation 

(CAUSE 

(GRAVITY (MASS SUN) (MASS PLANET)) 
(ATTRACTS SUN PLANET) ) 
(GREATER (TEMPERATURE SUN) 

(TEMPERATURE PLANET) ) 
(CAUSE (AND (GREATER (MASS SUN) 

(MASS PLANET) ) 
(ATTRACTS SUN PLANET) ) 
(REVOLVE- AROUND PLANET SUN) ) 



Rutherford Atom: Structured representation 

(cause (opposite-sign (charge nucleus) 
(charge electron) ) 
(attracts nucleus electron) ) 
(rilVOlve-around electron 

NUCLEUS) 
(GREATER (MASS NUCLEUS) 

(MASS ELECTRON) ) 



Solar System: Content Vector 



(AND . 1) 
(ATTRACTS . 1) 
(CAUSE . 2) 
(GRAVITY . 1) 
(GREATER . 2) 
(MASS . 2) 
(OBJECTS . 2) 
(REVOLVE -AROUND . 1) 
(TEMPERATURE . 2) 



Rutherford Atom: Content Vector 



(ATTRACTS . 1) 
(CAUSE . 1) 
(CHARGE . 2) 
(GREATER . 1) 
(MASS . 2) 
(OBJECTS . 2) 
(OPPOSITE-SIGN . 1) 
(REVOLVE-AROUND . 1) 



Here are some simple predicate calculus representations and the corresponding content vectors. A simple 
counting algorithm is used here, in the simulation these are normalized to unit vectors. 

Figure 6: Sample representations with content vectors 
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How good an approximation is the content vector dot product to what SME would produce? Suppose 
content vectors were generated using the simple counting algorithm described above. Then the product of 
each corresponding component is an overestimate of the number of match hypotheses that would be created 
between functo*^ of that type, because it does not take into account the cases when the arguments to the 
match hypotheses could not be aligned. There is also a possibility of underestimation, since the dot product 
does not take into account matches between non-identical functions and entities, since discovering those 
matches requires tracing predicate bindings. However, in practice, the number of entity and non-identical 
function matches tends to be smaller than the number of ungrounded matches, so overall the dot product 
tends to overestimate numerosity, and hence will tend to be an overestimate of what SME would produce. 

The dot product of content vectors provides exactly the computational basis the MAC stage needs. It could 
be implemented efficiently for large memories using a variety of massively parallel computation schemes. 
For instance, connectionist memories can be built which find the closest feature vector to a probe (Hinton 
& Anderson, 1989). Therefore the MAC stage can scale up. 

To summarize, the MAC matcher works as follows: Each memory item has a content vector stored with 
ii}^ When a probe enters, its content vector is computed. A score is computed for each item in the memory 
pool by taking the dot product of its content vector with the probe's content vector. The MAC selector then 
produces as output the best match and everything within 10% of it, as described above. (As for the FAC 
stage, variants that could be considered include adding a bound on the number of items returned (to model 
capacity limitations) and implementing a threshold on the MAC selector so that if every match is too low 
MAC returns nothing). 

Like other feature-vector schemes, the dot product of content vectors does not take the actual relational 
strucaire into account. It only calculates a numerical score, and hence doesn't produce the correspondences 
and candidate inferences which provide the power of analogical reasoning and learning. But the output of 
MAC feeds to the FAC stage, which operates on structured representations. Thus it is the FAC stage which 
both filters out structurally unsound remindings and produces the desired correspondences and candidate 
inferences. We claim that the interplay of the cheap but dumb computations of the MAC stage and the 
more expensive but structurally sensitive computations of the FAC stage explains the psychological 
phenomena of Section 2. As the first step in supporting this claim, we next demonstrate that MAC/FAC's 
behavior provides a good approximation of psychological data. 



4. Cognitive Simulation Experiments 

In this section we compare the performance of MAC/FAC with that of human subjects, using the "Karla 
the Hawk" stories (Centner, Rattermann & Forbus, 1993; Rattermann & Centner, 1987, Experiment 2). 
For these studies, we wrote sets of stories consisting of base stories plus four variants, created by 
systematically varying the kind of commonalities. All stories shared first-order relations (primarily events), 
but varied in which other commonalities were present, as shown in Table 1. The LS (literal similarity) 
stories shared both higher-order relational structure and object attributes. The AN stories shared higher- 
order relational structure but contained different attributes, while the SF stories shared attributes but 
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'° We normalize content vectors to unit vectors, both to reduce the sensitivity to overall size of the 
descriptions and because we assume that psychologically plausible implementation substrates for 
MAC/FAC (e.g., neural systems) will involve processing units of limited dynamic range. 
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contained different higher-order relational structure. The FOR stories differed both in attributes and 
higher-order relational structure. 





Common 


Common 


Common 




1st o. relations 


h.o. relations 


object attributes 


LS: 


Yes 


Yes 


Yes 


SF: 


Yes 


No 


Yes 


AN: 


Yes 


Yes 


No 


FOR: 


Yes 


No 


No 











Table 1: Types of stories used in the "Karla the Hawk" experiments 



In this study, the subjects were first given 32 stories to remember, of which 20 were base stories and 12 
were distractors. They were later presented with 20 proi>e stories which matched the base stories as 
follows: five LS matches, five AN matches, five SF matches and five FOR matches. They were told to 
write down any prior stories of which they were reminded. (Which stories were in each similarity condition 
was varied across subjects.) As shown in Table 2 the proportions of remindings for different match types 
were .56 for LS, .53 for SF, .12 for AN and.09 for FOR. Table 2 also shov/s that this retrievability order 
has been stable acorss three variations of this study: LS > SF > AN > FOR.** 



Condition 


Proportion 


LS 


0.56 


SF 


0.53 


AN 


0.12 


FOR 


0.09 



Table 2: Proportion of remindings for different match types: Human subjects 



As discussed above, this retrievability order differs strikingly from the soundness ordering. When subjects 
were asked to rate how sound the matches were how well the inferences from one story would apply to 
the other - they rated analogy (AN) and literal similarity (I^) as significantly more sound than surface 
similarity (SF) and FOR matches (matches based only on common first-order relations, primarily events). 
SME running in analogy mode on SF and AN matches correctly reflected human soundness rankings 
(Forbus & Gentner, 1989; Centner et al, in press; Skorstad et al, 1987). Here we seek to capture human 
retrieval patterns: Does MAC/FAC duplicate the human propensity for retrieving SF and LS matches 
rather than AN and FOR matches. The idea is to give MAC/FAC a memory set of stories, then probe with 
various new stories. To count as a retrieval, a story must make it through both MAC and FAC. We use 
replication of the ordering found in the psychological data, rather than the exact percentages, as our 



LS and SF did not differ significantly in retrievability. In Experiment 2, AN and FOR did not iiffer 
significantly, although in Experiment 1 AN matches were better retrieved than FOR matches. 
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criterion for success because this measure is more robust, being less sensitive to the detailed properties of 
the databases. 

For the computational experiments, we encoded predicate calculus representations for 9 of the 20 story sets 
(45 stories). Figure 7 shows one of the story representations. These stories are used in all three 
experiments described below. 

Figure 7: A representation from the Karla The Hawk story set 



(FOLLOW (FOLLOW 

(PROMISE MANl KARLA (SEE KARLA MANl ) 

(NOT (ATTACK MANl KARLA))} (ATTACK I«SAN1 K/^J^LA) ) 



JLi\^J\ rUUMX UCtCtt\) I 
















/MOT f ATTACK MANl KARTf \ \ \ \ 




( CAUSE 


(ANTLERED DSER) 


(OBTAIN MANl FEATHERS) 


(HOOFED DEER) 


(EQUALS (HAPPINESS MANl) HIGH)) 


(QUADRIPED DEER) 


(FOLLOW 


(MAMMAL DEER) 


(OFFER KARLA FEATHERS MANl) 


(THIN CROSS-BOW) 


(OBTAIN MANl FEATHERS) ) 


(LARGE CP.OSS-BOW) 


(CAUSE 


(MEDIE^/AL CROSS-BOW) 


(REALIZE KARLA 


(WOODEN CROSS-BOW) 


(DESIRE MANl FEATHERS)) 


(WEAPON CROSS-BOW) 


(OFFER KARLA FEATHERS HAND) 


(BLACK FEATHERS) 


(FOLLOW 


/COVERING FEATHERS^ 


(EQUALS 


(LONG FEATHERS) 


(SUCCESS 


(SOFT FEATHERS) 


(ATTACK MANl KARLA)) F) 


(ASSET FEATHERS) 


(REALIZE KARLA 


(VOCAL MANl) 


(DESIRE MANl FEATHERS))) 


(BIPED MANl) 


(CAUSE 


(HUNTER MANl) 


(NOT ( USED ~ FOR 


(WARLIKE MAN!) 


F-rlATHERS CROSS-BOW)) 


(HUMAN MANl) 


(EQUALS (SUCCESS 


(MALE MANl) 


ATTACK MANl KARLA) ) 


(PREDATORY KARLA) 


F) ) 


(BLACK KAPL^..) 


(FOLLOW 


(POWERFUL KARLA) 


(ATTACK MANl KARLA) 


(LARGE KARLA) 


(JSQUALS (SUCCESS 


(HAWK KARLA) 



(;>TTACK MA>J1 KARLA)) 
) 



4.7 Cognitive Simukitlon Experiment 1 

In our first satdy, we put the 9 base stories in memoryv along with the 9 FOR stories Vvhich served as 
distractors. We then used each of the variants - LS, SF, and AN - as probes, lliis roughly resembles the 
original task, but MAC/FAC's job is easier in that (1) it has only 18 stories in memory, while subjects had • 
32, in addition to their vast backgrou .d knov/ledge; (2) subjects were tested after a week's delay, so that 
tliere could have been some degradation of the memory representations. 

Table 3 shows the proportion of ti?nes the base story made i\ through MAC aijd (then) through FAC, The 
FAC output is what corresponds to human retrievals. MAC/FAC's peribi*mance is much better than that of 
the human subjects, perhaps partly because of the differences noted above. However, the key point that 
its results show the same ordering a>s those of human subject LS > SF > AN. 
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Probes 


MAC 


FAC 


LS 


1.0 


1.0 


SF 


0.89 


0.89 


AN 


0.67 


0.67 



Memcy contains 9 base stories and 9 FOR matches; probes were the LS. 9 SF, and 9 AN stories. 
The rows show proporuon of times the correct base story was retrieved for different probe types. 

Table 3: Proportion of correct retrievals given different kinds of probes 



4.2 Cognitive Simulation Experiment 2 

To give MAC/FAC a harder challenge, we put the four variants of each base story into memory. This 
made a larger memory set (36 stories) and also one with many competing similar choices. Each base story 
in turn was used as a probe. This is almost the reverse of the task subjects faced, and is more difficult. 

Table 4 shows the mean number of matches of different similarity types that succeed in getting through 
MAC and (then) through FAC. There are several interesting points to note here. First, the retrieval results 
(i.e., the number that make it through both stages) ordinally match the results for human subjects: LS > SF 
> AN > FOR. This degree of fit is encouraging, given the difference in task. Second, as expected, MAC 
produces some matches that are rejecter' by FAC. This number depends partly on the criteria for the two 
stages. Here, with MAC and FAC both set at 10%, the mean number of memcry items produced by MAC 
is 3.4, and the mean number accepted by FAC is 1 .6. Third, as expected, FAC succeeds in acting as a 
structural filter on the MAC matches. It accepts all of the LS matches MAC proposes and some of the 
partial matches (i.e., SF and AN), while rejecting most of the inappropriate matches (i.e., FOR and 
matches with stories from other sets). 



Retrievals 


MAC 


FAC 


LS 


0.78 


0.78 


SF 


0.78 


0.44 


TA 


0.33 


0.22 


FOR 


0.22 


0.0 


Other 


1.33 


0.22 



Memory contains 36 stories (LS, SF, AN, and FOR for 9 story sets); the 9 base stories used as probes 
Other = any retrieval from a story set different from the one to which the base belongs. 

Table 4: Mean numbers of different match types retrieved per probe when base stories are used 
as pk'obes 

4.3 Cognitive Simulation Experiment 3 

In the prior simulations, LS matches were the resounding winner. While this is reassuring, it is also 
interesting to know which matches would be retrieved if there were no perfect overall matches. Therefore 
we removed the LS variants from memory and repeated the second simulation experiment, again probing 
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with the base stories. As Table 5 shows, SF matches are now the clear winners in both the MAC and FAC 
stages. Again, the ordinal results match well with those of subjects: SF > AN > FOR. 



Retrievals 


MAC 


FAC 


SF 


0.88 


0.78 


AN 


0.56 


0.56 


FUR 


0.22 


0.11 


Other 


1.11 


0.11 



Memory contains 27 stories (9 SF, 9 AN, 9 FOR); 9 base stories used as probes 

Table 5: Mean numbers of different match types retrieved per probe with base stories as probes and 
no LS stories in memory 

4A Summary of Cognitive Simulation Experiments 



The results are encouraging. First, MAC/FAC*s retrieval results (i.e., the number that make it through both 
stages) ordinally match the results for human subjects: LS > SF > AN > FOR. Second, as expected, MAC 
produces some matches that are rejected by FAC. The mean number of memory items produced by MAC is 
3.4 and the mean number accepted by FAC is 1.6. Third, FAC succeeds in its job as a structural filter on 
the MAC matches. It accepts all of the LS matches proposed by MAC and some of the partial matches 
(the SF, AN and FOR matches), and rejects most of the inappropriate matches (the "other" matches from 
different story sets). It might seem puzzling that FAC accepts more SF matches than AN matches, when it 
normally would prefer AN over SF. The reason is that it is not generally being offered this choice. Rather, 
it must choose the best from the matches passed on by MAC for a given probe (which might be AN and 
LS, or SF and LS, for example). 

It is useful to compare MAC/FAC *s performance with that of Thagard, Holyoak, Nelson & Gochfeld's 
(1990) ARCS model of similarity-based retrieval, the most comparable alternate model. Thagard et al gave 
ARCS the Karla the hawk story in memory along with 100 fables as distractors. When given the four 
similarity variants as probes, ARCS produced asymptotic activations as follows: LS (.67), FOR (-.1 1), SF 
(-.17), AN (-.27). ARCS thus exhibits at least two violations of the LS > SF > AN > FOR order found for 
human remindings. First, SF remindings, which should be about as likely as LS remindings, are quite 
infrequent in ARCS - less frequent than even the FOR matches. Second, AN matches are less frequent 
than FOR matches in ARCS, whereas for humans AN was always ordinally greater than FOR and (in 
Experiment 1) significantly so. Thus MAC/FAC explains the data better than ARCS. This is especially 
interesting because Thagard et al argue that a complex localist connectionist network which integrates 
semantic, structural, and pragmatic constraints is required to model similarity-based reminding. While 
such models arc intriguing, MAC/FAC shows that a simp'ar model can provide a better account of the 
data. We compare MAC/FAC with ARCS in more detail in Section 6. 

Finally, and most importantly, MAC/FAC's overall pattern of behavior captures the motivating phenomena. 
It allows for structured representations and for processes of structural alignment and mapping over these 
representations thus satisfying the structural representation and structured mappings criteria. It produces 
fewer analogical matches than literal similarity or surface matches, thus satisfying the existence of rare 
insights criterion. The majority of its retrievals are LS matches, thus satisfying the primacy of the 



mundane criterion. It also produces a fairly large number of SF matches, thus satisfying the surface 
superiority criterion. Finally, its algorithms are simple enough to apply over large-scale memories, thus 
satisfying the scalability criterion. 

5. Sensitivity Analyses 

The experiments of the previous section show that the MAC/FAC model can account for psychological 
retrieval data. This section looks more closely into why it does, by seeing how sensitive the results are to 
different factors in the model. These analyses are similar in spirit to those carried out by VanLehn (1989) 
in his SIERRA project. VanLehn used his model to generate different possible learning sequences to see if 
these v'^riations covered the space of observed mistakes made by human learners in subtraction problems. 
Thus variations in the model were used to generate hypotheses about the space of individual differences. 
Our methodology is quite similar, in that we vary aspects of our model in order to better understand how it 
accounts for data. The key difference is that we are not attempting to model individual differences, but 
instead are investigating how our results depend on different aspects of the theory. Such sensitivity 
analyses are routinely used in other areas of science and engineering; we believe they are also an important 
tool for cognitive modeling. 

Sensitivity analyses can provide insight into why a simulation works. Any working cognitive simulation 
rests on a large number of design choices. Examples of design choices include the setting of parameters, 
the kinds of data provided as input, and even the particular algorithms used. Some of these design choices 
are forced by the theory being tested, some choices are only weakly constrained by the theory, and others 
are irrelevant to the theory being tested, but are necessary to create a working artifact. Sensitivity analyses 
can help verify that the source of a simulation's performance rests with the theoretically important design 
choices. Varying theoretically forced choices should lead to a degradation of the simulation's ability to 
replicate human performance. Otherwise, the source of the performance lies elsewhere. On the other hand, 
varying theoretically irrelevant choices should not affect the results, and if they do, it suggests that 
something other than the motivating theory is responsible for the simulator's performance. Finally, seeing 
how the ability to match human performance varies with parameters that are only weakly constrained by 
theory can lead to insights about why the model works. 

In the rest of this section, we describe a series of sensitivity experiments on MAC/FAC. These experiments 
demonstrate that its ab'lity to replicate human performance is robust, and that this ability depends crucially 
on the theoretically important design choices. We first describe the methodology used in these experiments 
in detail, and then describe three sensitivity analyses. 

5. 1 Method for Sensitivity Analyses 

A sensitivity analysis requires a standard of comparison, a baseline against which to judge the results of 
variations. We use as our baseline the simulation experiments described in Section 4 We say that a 
particular set of design choices satisfies the data if re-running the simulation experiments with that set of 
design choices yields results which match the human data. That is, the frequency of retrievals must follow 
the pattern LS > SF > AN > FOR. 

There are many design choices which could be explored via sensitivity analyses. Conceptually, one can 
think of sets of design choices as points in a high dimensional space. In essense, the simulation studies of 
Section 4 provide information about one point in the design space. This metaphor is excellent for choices 
of numerical parameters, since these dimensions can be viewed as continuous. This metaphor is not as 



useful for other kinds of design choices, e.g.. algorithmic choices, since systematically enumerating the set 
of plausible algorithms for a task is quite difficult. To best visualize the results, choosing two numerical 
dimensions to vary allows patterns of satisfaction to be displayed as a table, whose entries represent 
measurements of the ability of the model to satisfy the data at sampled points in the design space. 

The two most interesting numerical parameters with respect to sensitivity analyses on MAC/FAC are the 
selector widths for the MAC and FAC stages, since these are only weakly constrained by the theory. They 
should be narrow, in order to reject inappropriate remindings. but we currently see no theoretically 
motivated method to calculate precise predictions for these parameters. Therefore in these analyses we use 
an empirical approach. We vary the selector widths, using these variations as the axes of a subset of the 
design space. Recall that a selector of width W accepts all matches within W % of the largest input. That 
is. a selector with width 10 % outputs the best match plus any other matches that are within 10 % of the 
score of the best match, while a selector of width 100 % will simply pass through all of its inputs. In the 
experiments below, selector widths for both MAC and FAC are varied from 1 to 100 %. in 10 % 
increments. Each entry in the table indicates whether that pair of width settings, combined with the other 
design choices, satisfied the data. When the pattern of retrieval is violated, the table entry contains 
information about the particular kind of violation. 

Viewed as a map. the table of results from the sensitivity analysis can be divided into viable regions, 
subspaces of design choices which allow the model to satisfiy the data, and non-viable regions , where they 
do not. The existence of viable regions is of course critical for a successful simulation. However, the 
nature of the non-viable regions is also interesting, because they provide a source of insight into why the 
model works. Seeing how a bridge collapses after replacing a particular strut with a weaker material 
(preferably via simulation) supports the conclusion that the strength of that strut was a factor in preventing 
collapse. 

It should be noted that the computational costs of these experiments is large but not horrendous. 
Essentially, the cognitive simulation experiments of Sections 4. land 4.2 were replicated for each pair of 
selector widths, i.e., 121 times. Each repetition required ninning the MAC matcher 810 times.^^ for a total 
of 98,010 times. The number of FAC executions varies with the size of the set output from MAC, of 
course, and varied substantially according to the particular design choices made (as shown below). A 
reasonably accurate estimate for the lower bound of FAC executions for each experiment was 900 and a 
reasonable upper bound is 1.600. The MAC matcher takes roughly 0.002 seconds for each pair of content 
vectors and the FAC matcher (i.e.. SME) takes between 1.0 and 1 1 seconds for each pair of structured 
representations, with an average time of roughly 4 seconds.*^ Thus the time to run MAC/FAC for each 
probe typically ranges from 3 to 10 seconds. A naive system for doing sensitivity analyses could use as 
much as five hours per analysis (14.000 seconds for MAC. 4,000 seconds for FAC). However, we found 
that by caching the results of matches in a simple database, we could cut the CPU requirements for these 
analyses considerably. 



*^The first experiment involves 486 MAC executions because there are 18 stories in memory and 27 
probes. The second experiment involves 324 MAC executions because there are 36 stories in memory and 
9 probes. 

These times are for an IBM RS/6000 Model 350. SME3b. which was used in all experiments in this 
section. An earlier version of SME was used in Forbus & Centner 1991, and in the experiments in Section 
6. 



5.2 Sensitivity Analysis One: Robustness 

In this experiment we tested the robustness of MAC/FAC's ability to satisfy the data by varying the 
selector widths. Table 6 shows the results. Notice that there is one region which satisfies the data: When 
the MAC width is between 10 % and 20 % and FAC is at least 10 %. The moderately large viable 
subspace indicates that MAC/FACs performance is robust, and not hostage to a particular choice of 
selector width settings. 



Rows: MAC widths. Columns: FAC widths. 
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Legend 

Y = Satisfies the data 

4 = No analogies 

64 = SF < AN 

80= LS<AN, SF<AN 

112= LS<AN, SF<AN, LS<FOR 

368= LS<AN, SF<AN, LS<FOR, LS<DT 

Rows are the width of the MAC selector, Columns correspond to the width of the FAC selector. The codes 
describe whether that combination of selector widths allows MAC/FAC to account for the human data, and if not, 
what criteria were violated. 

Table 6: Sensitivity to selector width, normalized content vectors 



As discussed above, it is important to show that there are parameter settings that do not fit the human data» 
to establish that the theoretical variables actually matter. When either MAC or FAC is too narrow (i.e., 
MAC of 1 % or FAC of 1 %), analogies are never retrieved. This violates the rare insights criterion. 
When MAC is broad ( 30 % or larger), making FAC too broad leads first to too many analogies, and then 
to junk remindings. The shape of the region of viability suggests that while FAC is necessary to provide 
structural matching and candidate inferences, MAC provides most of the filtering. Since that is MAC's 
intended purpose, this provides further evidence that the simulation works according to the principles of its 
design, rather than some unknown factor. 

The evidence that the results are not very sensitive to the particular choice of selector widths in the original 
experiments (i.e., 10 % for both MAC and FAC) is reassuring. The next two sensitivity analyses explore 
other design choices, using the same methodology as this experiment. 
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5.3 Sensitivity Analysis Two: Irrelevance of normalization details 

In other sensitivity experiments on analogical processing algorithms (Forbus & Gentner, 1989), we 
demonstrated that the choice of normalization algorithm could affect outcomes in simulations of structural 
evaluation in comparisons. The purpose of this analysis is to determine if our design choice of using unit 
content vectors (see Section 3.3) was a significant factor in our results. 

To explore this question, we consider two variations on the content vector representation. The first 
variation is simply not to use any kind of normalization at all. That is, we simply use as the strength of 
each component of the content vector the number of statements and terms which contained the 
corresponding predicate. (The computation of normalized content ve-ctors involves an additional step, of 
dividing each component by the total number of predicates in the description.) The results of this 
manipulation are illustrated in Table 7. The key point to notice about this table is that the viable region is 
roughly the same size and shape as the viable region for normalized content vectors. This lends support to 
the claim that the outcome of the simulation experiments is not heavily determined by the particular 
normalization algorithm chosen. 
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Legend 

4 = No analogies 

64 = SF < AN 

80 = LS<AN, SF<AN 

88 = LS < SF, LS < AN , SF < AN 

112 = LS < AN. LS < FA. SF < AN 

120 = LS < SF. < AN, LS < FA. SF < AN 

256 = LS < DT 

336 = LS < AN, SF < AN. LS < DT 

368 = LS<AN, LS<FA.SF<AN.LS<DT 

376 = LS < SF, LS < AN, LS < FA, SF < AN, LS < DT 



Rows are the width of the MAC selector, Columns correspond to the width of the FAC selector. The codes 
describe whether that combination of selector widths allows MAC/FAC to account for the human data, and if not, 
what criteria were violated. 

Table 7: Sensitivity analysis, unnormalized content vectors 
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The second variation we consider is to change what aspect of the overlap content vectors measure. Recall 
that the idea of content vectors is to compare the pattern of functors which appear in two structured 
descriptions. There are several ways to characterize such patterns. The MAC/FAC design choice, 
normalized content vectors, estimates the overlap in terms of the relative frequency of functors in the two 
descriptions, independent of their sizes. The unnormalized content vectors just examined estimate the total 
size of the overlap. But is it the pattern of overlap that is relevant, or just how many functors two 
descriptions have in common? We can investigate this question by changing the structure of content 
vectors so that they represent only the set of predicates that are used in the structured representation, 
without regard to number of occurrences. We call this variation binary content vectors since each 
component is essentially a one bit answer to the question of whether the structured represenation contains 
or does not contain a particular predicate. Thus the dot product of two binary content vectors is a measure 
of the overlap in number of shared predicates. (Again, we normalize to unit vectors, both to avoid size 
biases and because we assume that psychologically plausible implementation substrates (e.g., neural 
systems) will have limited dynaniic range.) The results of this manipulation are shown in Table 8. 
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Legend 

Y = Syslit predictions satisfied 

4 = No analogies 

64 = SF<AN 

80 = LS<AN, SF<AN 

112 = LS < AN, LS <FA, SF< AN 

368 LS < AN, LS ^ FA, SF< AN, LS ^ DT 

Binary content vectors measure the size of overlap in predicates. As before, rows are the width of the MAC 
selector. Columns correspond to the width of the FAC selector. The codes describe whether that combination of 
selector widths allows MAC/FAC to account for the human data, and if not, what criteria were violated. 

Table 8: Sensitivity analysis for binary content vectors 



Again, the overall pattern of results is the same: With selector widths that are too narrow no analogies are 
retrieved, and with selector widths that are too broad, too many analogies are retrieved, followed as widths 
increase by too many "junk" retrievals. The interesting difference is that the region for the selector widths 
has changed: The viable wide-FAC range lies with MAC between 30-50 %, whereas it was between 10-- 
20 % for the original content vectors. Comparing the average number of representations output by MAC 
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for these ranges provides some insight as to why this should be so: For binary content vectors the average 
output size was 2, for standard content vectors the average was 1.5. In both cases, the next step of MAC 
selector width allows, on the average, another representation to make it through to FAC. Yet one more step 
in MAC selector width allows many more representations to get through to FAC. Thus measuring only the 
number of shared predicates shifts the viable region, but does not substantially change its character. 

From these two analyses, we conclude that the choice of normalization algorithm does not substantively 
affect the results. Since the normalization algorithm is not a theoretically determined choice, these analyses 
support the conclusion that the simulation works according to the theoretical account. 

5.4 Sensitivity Analysis Ttiree: Attributes versus relations 

Content vectors homogenize structured representations. They unify information about attributes of objects, 
relationships between objects, and argument structure. Is including every kind of information in content 
vectors necessary? Given the frequency of literal similarity and surface feature matches, both of which 
share many attributes, a possible hypothesis is that content vectors could be built using attributes alone. 
On the other extreme, the approachs used in case-based reasoning tend to ignore attributes, and use only 
relational information. To mimic these approaches in MAC/FAC, we could use content vectors which 
leave out attributes, and include only relational predicates. This analysis explores both of these extreme 
hypotheses. 

To explore the degree to which using attribute information only in content vectors would allow MAC/FAC 
to satisfy the data, we modified the algorithm which computes content vectors to ignore anything other than 
attributes. The results of the sensitivty analysis are shown in Table 9. The pattern of results is 
dramatically different than in previous experiments. There is no viable region at all. This experiment 
provides strong evidence that using attribute information alone in content vectors cannot satisfy the data. 

The failure of attributes alone to provide adequate filtering may not be surprising. Is relational information 
alone enough? To explore this question we again modified the algorithm that computes content vectors, 
this time to not include attributes. These new content vectors therefore only contained relationships 
between objects and higher-order relations, such as logical connectives. The same methodology for the 
sensitivity analysis was followed. 

The results of the sensitivity analysis are shown in Table 10. Like the attribute-only content vectors, the 
relation-only content vectors also fail to satisfy the data in a psychologically plausible manner, but for 
different reasons. Almost uniformly, that is, when either the MAC width is less than 40 % or when the 
FAC width is greater than 70 %, more "junk" matches come through - stories from other sets, and FOR 
stories (i.e., those which match only in terms of first-order relations and not attributes or causal structure). 
The region where the data is not satisfied and the MAC width ranges between 20 % and 70 % is very much 
like the failures that occur for the attribute-only vectors (e.g., more analogies retrieved with narrow FAC 
than psychologically plausible). There is in fact a region in Table 10 where the pattern of results matches 
the human data, when the MAC width is between 40% and 50% and the FAC width ranges ft-om 20% to 
either 60% or 70%. However, the size of the MAC output in this range is roughly one half of the total size 
of the memory pool. Consequently, this is not a viable region, because it demands far too much of FAC. 
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Legend 

Y = Syslit predictions satisfied 

4 = No analogies 

64 = SF<AN 

80 = LS < AN, SF < AN 

88 = LS < SF, LS < AN, SF < AN 

1 12 = LS < AN, LS < FA, SF < AN 

1 20 = LS < SF, LS ^ AN, LS < FA, SF < AN 

60 = No analogies, LS <, DT 320 = SF < AN, LS < DT 

322 = No surface matches, SF ^ AN, LS < DT 

326 = No surface matches, no analogies, SF < AN, LS < DT 

336 = LS < AN, SF < AN, LS < DT 

344 = LS < SF, LS < AN, SF ^ AN, LS < DT 

368 = LS < AN, LS ^ FA, SF < AN, LS < DT 

376 = LS < SF, LS ^ AN, LS < FA, SF < AN, LS ^ DT 



These results obtained are using content vectors which only included attributes, leaving out relations and logical 
connectives. As before, rows are the width of the MAC selector. Columns correspond to the width of the FAC 
selector. The codes describe whether that pair of selector widths allows MAC/FAC to account for the human data, 
and if not, what criteria were violated. 

Table 9 Sensitivity analysis of attribute-only content vectors 
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Rows: MAC widths. Columns: FAC widths. 
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Legend 

Y = Satisfies the psychological data 
4 = No analogies 



64 = SF<AN 

80 = LS<AN, SF<AN 

112 = LS<AN, SF<AN. LS<FOR 

256 = LS < DT 260 = No analogies. LS < DT. LS < DT 

262 = No surface matches, no analogies. LS < DT. LS < DT 

264 = LS < SF. LS < DT. LS < DT 

268 = No analogies. LS < SF. LS < DT. LS < DT 

270 = No surface matches, no analogies. LS < SF. LS < DT. LS < DT 3 

36 = LS<AN. SF<AN.LS<DT. LS<DT 

368 = LS<AN. SF<AN. LS<FOR. LS<DT 

384 = AN<FOR.LS<DT 

These results are obtained using content vectors which only included relations and logical connectives, leaving out 
attributes. As before, rows are the width of the MAC selector. Columns correspond to the width of the FAC 
selector. The codes describe whether that pair of selector widths allows MAC/FAC to account for the human data, 
and if not, what criteria were violated. 

Table 10 Manipulation: Relation-only vectors 



These experiments provide evidence that neithei attribute information nor relational structure, by 
themselves, provide the right kind of information to allow the MAC/FAC model to plausibly satisfy the 
psychological data. Although such generalizations must be viewed with caution, the analysis of why these 
alternatives fail may be applied to any retrieval model, not just MAC/FAC. Using attribute information 
alone does not allow a retrieval system to satisfy the rare insights criterion, since the relational information 
is not used as a cue in retrieval. Using relational information alone tends to violate the scalability criterion, 
since large fractions of memory must be searched when the discrimination provided by the relational 
vocabulary is inadequate. 



6. Comparing MAC/FAC and ARCS on ARCS datasets 

As mentioned earlier, the model of similarity-based retrieval that is closest to MAC/FAC is ARCS 
(Thagard, et. al. 1990). The ARCS algorithm is shown in Figure 8. ARCS uses a localist connectionist 
network to apply semantic, structural, and pragmatic constraints to selecting items from memory. Most of 
the work in ARCS is carried out by the constraint satisfaction network, which provides an elegant 
mechanism for integrating the disparate constraints that Thagard et al posulate as important to retrieval. 
The use of competition in retrieval is designed to reduce the number of candidates retrieved. Using 
pragmatic information provides a means for the system's goals to affect the retrieval process. 



Given a pool of memory items II. .In and a probe P: 

1. For each item li, include it in a matching network if there are any predicates in li that are semantically similar 
to a predicate in P. The matching network implements semantic and structural constraints. 

2. Create inhibitory links between units representing competing retrieval hypotheses, to ensure competitive 
retrieval. 

3. Install pragmatic constraints by creating excitatory links between a special pragmatic node and every predicate 
marked by the user as important. 

4. Run the network until it settles. 

Figure 8: The ARCS algorithm 



After the network settles, an ordering can be placed on nodes representing retrieval hypotheses based on 
their activation. Unfortunately, no formal criterion was ever specified by which a subset of these retrieval 
hypotheses is selected to be considered as what is retrieved by ARCS. Consequently, in the experiments 
below we mainly focus on the subset of retrieval nodes mentioned by Thagard et al in their paper. 

6.1 Theoretical tradeoffs 

Both models have their appeals and drawbacks. Here we briefly examine several of each. 

Pragmatic effects:ln MAC/FAC it is assumed that pragmatics and context affect retrieval according to 
what is encoded in the probe. That is, we assume that plans and goals are important enough to be 
explicitly represented, and hence will affect retrieval. In ARCS additional influence can be placed on 
particular subsets of such information by the user marking it as important. The tradeoff between these 
alternatives will best be explored by embedding them in larger, task-oriented simulations, so we do not 
consider effects of pragmatics further in this paper. 

• Utility of results: Because MAC/FAC uses SME in the FAC stage, the result of retrieval can include 
novel candidate inferences. Since the purpose of retrieval is to find new knowledge to apply to the 
probe, this is a substantial advantage. ARCS could close this gap somewhat by using ACME 
(Holyoak & Thagard, 1989) as a post processor. 

• Initial filtering: MAC/FAC's content vectors represent the overall pattern of predicates occuring in a 
structured description, so that the dot product cheaply estimates overiap. ARCS' committment to 
creating a network if there is any predicate overiap places more of the retrieval burden on the expensive 
process of setting up networks. The inclusive rather than exclusive nature of ARCS' initial stage leads 
to the paradoxical fact that a system in which pragmatic constraints are central must ignore CAUSE, 
IF, and other inferentially important predicates to be tractable. 
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• Modeling inter-item effects: Wharton, Holyoak, Downing, Lange, Wickens, and Melz (1994) have 
shown that ARCS can model effects of competition between memory items in heightening the relative 
effect of structural similarity to the probe. 
Perhaps the most important issue is the notion of semantic similarity. A key issue in analogical processing 
is what criterion should be used to decide if two elements can be placed into correspondence. The FAC 
stage of MAC/FAC follows the standard structure-mapping position that analogy is concerned with 
discovering identical relational systems. Thus, other elements can be matched flexibly in service of 
relational matching: any two entities can be placed in correspondence and functions can be matched non- 
identically if doing so enables a larger structure to match. But relations have only three choices: they can 
match identically, as in (a); they can fail to match, as in (b); if the surrounding structural match warrants 
it, they can be re-represented in such a way that part of their representation now matches identically, as in 
the shift from (c) to (d). 

(a) HEAVIER [camel, cow] HEAVIER [giraffe, donkey] 

(b) HEAVIER [camel, cow] ~ BITE [dromedary, calf] 

(c) HEAVIER [camel, cow] — TALLER [giraffe, donkey] 

(d) GREATER [WEIGHT(camel), WEIGHT(cow)] ~ GREATER [HEIGHT(camel), HEIGHT(cow)]. 

ACME and ARCS also share the intuition that analogy is a kind of compromise between similarity of 
larger structures and similarity of individual elements — semantic similarity, in Holyoak and Thagard's 
(1989) terms. But the local similarity metric is different. These systems use graded similarity at all levels 
and for all kinds of predicates; relations have no special status. Thus ARCS and ACME might find pair 
(b) above more similar than pair (a), because of the object similarity. This would not be true for SME and 
MAC/FAC. 

In ACME, semantic similarity was operationalized using similarity tables. For any potential matching 
term, a similarity table was used to assign a similarity rating, which was then combined with other evidence 
to decide whether the two predicates could match. Thus, in the examples above, both pair (b) and pair (c) 
stand a good chance of being matched, depending on the stored similarities between TALLER, HEAVIER, 
and BITE, camel, dromedary and giraffe, and so on. 

In ARCS, an augmented subset of WordNet (Miller, Fellbaum, Kegl, & Miller, 1988) was used to make 
semantic similarity decisions. WordNet is a psycholinguistic database describing relationships between 
words. Two predicates in ARCS are considered semantically similai if their corresponding lexical concepts 
in WordNet are connected via links that denote particular relationships. The use of WordNet as a 
database for simple lexical inferences is an appealing idea. The lexical connections found in this way 
should have well-founded motivations. Nevertheless, it important to remember that WordNet was intended 
as a lexicon, not a language of thought. Using the lexical concepts of WordNet as a predicate vocabulary 
requires assuming that there exist conceptual representations that correspond to these lexical concepts. 
That does not seem an implausible assumption. However, assuming that relationships between w^^rds, such 
as synonym or antonym are used in the cognitive processing of internal representations seems implausible. 

We prefer our tiered identicality account, which uses inexpensive inference techniques to suggest ways to 
re-represent non-identical relations into a canonical representation language. Such canonicalization has 
many advantages for complex, rich knowledge systems, where meaning arises from the axioms that 
predicates participate in. When mismatches occur in a context where it is desirable to make the match, we 
assume that people make use of techniques of re-representation. An example of an inexpensive inference 
technique to suggest rerepresentation is Falkenhainer's (1987, 1990a) minimal ascension method, which 
looks for common superordinates when context suggested that two predicates should match. The use of 
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pure identicality augmented by minimal ascension allowed Falkenhainer's PHINEAS system to model the 
discovery of a variety of physical theories by analogy. We believe that WordNet could be used in a similar 
fashion, since it has superordinate information. 



Holyoak & Thagard have argued that broader (i.e., weaker) notions of semantic similarity are crucial in 
retrieval, for otherwise we would suffer from too many missed retrievals. Although this at first sounds 
reasonable, there is a counterargument based on memory size. Human memories are far larger than any 
cognitive simulation yet constructed. In such a case, the problem of false positives (i.e., too many 
irrelevant retrievals) becomes critical. False negatives are of course a problem, but they can be overcome 
to some extent by reformulating and re-representing the probe, treating memory access as an iterative 
process interleaved with other forms of reasoning (as in Lange & Whartcn's (1992a,b, 1993) REMIND 
model). Thus it could be argued that strong semantic similarity constraints, combined with re- 
representation, are crucial in retrieval as well as in mapping. 

How do these different accounts of semantic similarity fare in predicting patterns of retrieval? In the rest of 
this section we tackle this question by comparing the performance of MAC/FAC and ARCS on a variety of 
examples. 

6.2 Computational Experiments comparing MAC/FAC and ARCS 
62.1 Methods 

Each experiment below has a similar structure. First each simulation is given a memory, consisting of one 
or more databases drawn from the ARCS representations.*'* Then retrieval is tested with probes drawn 
from a small predefined set of stories, replicating Thagard et.al.'s experiments. The memory a simulation 
operates over consists of one or more databases. In some cases the memory is augmented by a particular 
story: e.g., when probing with variant Hawk stories, the Thagard et.al. encoding of the "Karla the Hawk" 
story is added to memory. (This is done to see if the retrieval system is able to find the base story amidst 
the distractors, given variations on the story as probes.) 

For brevity we specify the probe set and memory contents symbolically, using to distinguish probe set 
from memory and to indicate set union. Thus HAWK/(PLAYS+i<^la Base) indicates an experiment 
where the database of plays was probed with the Hawk stories. A description of the datasets used and 
these conventions is summarized in Figure 9. 



''^To date we have been unsuccessful in getting ARCS to run on many of the representations we used in 
Sections 4 and 5. In some cases ARCS' network does not settle after even 1,(KX) iterations, and lun times 
of up to twelve hours have been required. 
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Databases; 

FABLES = 100 encodings of Aesop's fables, encoded by Thagard etai. 
PLAYS ^ 25 encodings of Shakespeare's plays, encoded by Thagard et.al 
Story sets used as probes and memoi7 items: 

HAV/K^ Thagard et.al/s encoding of the "Karia rhe Hawk" story set, i.e., original story, analog, appearance 
match, false analogy, and literal simiiaiity versions. Databases using these probes have the onginaJ story added to 
memoiy, except when the original story itself is used as a probe. 

SG = Thagard et.al. 's encoding of the Sour Grapes fable plus variations, i.e., original story, analog, appearance, 
and literal similarity versions. Databases using these probes have the original story added to memory, except when 
the original story itself is used as a probe. 

H&WSS = Thagaid et.al' s encoding of Ham^^et and V/est Side Story. When Hamlet is used as a probe it is removed 
fronr. memory. West Side Story is never placed in memory. 

Convention; For convenience, we refer to an experimental setup by the probe stories followed by the database 
used, e.g., SG/(FABLES+PLAYS) means that the Sour Grapes fables were used as probes with a memory 
consisting of both plays and fables. When a stoiy is used as a probe, it is removed from memory first. 

Figure 9: Databases and experimental stories used in the experiments 



Both MAC/FAC and ARCS take prepositional representations as inputs, but their representation 
conventions are quite different. The most crucial difference is that structure-mapping treats attributes, 
relations, and functions differently, whereas ARCS does not distinguish them. We used the following rules 
in translation: (1) One-place predicates were classified as attributes, (2) multi-argument predicates were 
classified as relations, and (3) since the arguments to CAUSE could be cither events or modal propositions, 
we treated predicates used as arguments to a CAUSE statement either as modal relations (e.g., 
BECOMING -TRUE ) or functions (e.g., MARRIED , KILLED), Since functions can be substituted under 
structure-mapping's identicality critierion, we ran these experiments on representations translated both with 
and without rule (3) above, i.e., witii and without functions. Witli one exception, noted below, the results 
were essentially identical with either translation scheme. 

All runtimes are measured according to the Lucid Common Lisp interna), clock. A single computer^^ was 
used for both simulations, so that mn times would be comparable. 

Replication of computational experiments is still something of a novelty, and standards for ensuring that 
reported simulation results are repeatable have not yet been established in cognitive science. Nevertheless, 
we have taken many precautions to ensure that we have run AJRCS correctly. Where numerical information 
was available, for instance, we matched numerical results reported by them to several decimal places. One 
concern was what should count as a retrieval in ARCS. Neither the original ARCS paper nor the code 
defmes a criterion for distinguishing when an item is actually retrieved (indeed, stories with negative 
activations were sometimes considered retrievals). In reporting ARCS results we cut off the list of 
retrieved results where Thagard et al. did. In some cases (e.g., fables) this represented a sharp boundary, in 
otlier cases (e.g., plays) it did not. 



6.3 Experiment 1: Sour Grapes Comparison 

In the first study the memory set consists of the fables, including the Sour Grapes fable, and the probes are 
variants of Sour Grapes. For ARCS results, the numbers in parentheses represent tht level of activation it 



An IBM RS/6(X)0 Model 530, with 128MB of RAM using uucid Common Lisp 4.01. 
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computed for that retrieval. For MAC/FAC results, the numbers in parentheses represent the scores 
computed by the MAC or FAC stages, as marked. 



Table 1 1 shows the results. The results for ARCS match those reported for the simulation by Thagi:' J 
et.al. The MAC/FAC results are quite similar. Thus both systems successfully retrieve Sour Grapes from 
a database of fables when given variations of it. However, MAC/FAC is substantially faster. The runtime 
difference is fairiy typical; MAC/FAC tends to be two orders of magnitude faster than ARCS when tested 
with identical data on the same computer. 



ARCS results 



Probe 


Results 


Seconds 


Sour Grapes appearance 


Sour Grapes (0.28) 


120 


Sour Grapes, analog 


Sour Grapes (0.21) 


81 


Sour Grapes, literal 
similarity 


Sour Grapes (0.25) 


123 



MACVFAC Results 



Probe 


Results 


Seconds 


Sour Grapes appearance 


FAC: Sour Grapes (0.53) 
MAC: Sour Grapes (0.56) 


0.3 


Sour Grapes analog 


FAC: Sour Grapes (2.03) 
MAC: Sour Grapes (0.62) 


0.2 


Sour Grapes literal 
similarity 


FAC: Sour Grapes (2.03) 
MAC: Sour Grapes (0.62) 


0.2 



For ARCS results, the numbers in parentheses represent the level of activation it computed for that 
retrieval. For MAC/FAC results, the numbers in parentheses represent the scores computed by the MAC 
or FAC stages, as marked. 

Table 11: Results for SG/Fables experiment 

6.4 Experiment 2: Effects of additional memory items on retrieval (Sour Grapes) 

To check the stabilit>' of results under changes in memory contents, we reran Experiment 1, adding the 
database of 25 Shakespeare plays encoded by Thagard et al. to the fables database. We then tested the 
simulations to see if they would retrieve Sour Grapes from the database of 125 fables and plays when 
probed with variations of Sour Grapes. The results are show in Table 12. MAC/FAC s results remain 
unchanged, except for a small increase in processing time. ARCS, on the other hand, is distracted by the 
plays in one of the probe conditions. Increasing the memory by 25% has led to different results with 
ARCS. The results also hint at a possible size bias in ARCS: It appears to prefer larger descriptions in 
retrieval, at the cost of correct matches. 
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ARCS Results 



Probe 


Results 


Seconds 


Sour Grapes appearance 


Sour Grapes (0.28) 


327 


Sour Grapes, analog 


The Taming of the Shrew (0.22), Merry Wives 
(0.18), 

[11 stories]. Sour Grapes (-0.19) 


251 


Sour Grapes, literal 
similarity 


Sour Grapes (0.25) 


373 


MAC/FAC Results 










Probe 


Results 


Seconds 






Sour Grapes appearance 


FAC: Sour Grapes (0.53) 
MAC: Sour Grapes (0.56) 


0.4 






Sour Grapes analog 


FAC: Sour Grapes (2.03) 
MAC: Sour Grapes (0.62) 


0.3 






Sour Grapes, literal 
similarity 


FAC: Sour Grapes (2.03) 
MAC: Sour Grapes (0.62) 


0.3 







Table 12:Results of SG probes, database = Fables + Plays 



6.5 Experiment 3: Larger Probe sizes 

While the results for MACTFAC in Experiment 2 are satisfactory, ARCS' seemingly poor performance 
requires further investigation. Does the relative size of the probe matter in the memory swamping effect? 
To find this out, we again ran both simulations, first with the plays database as memory, then with the 25 
plays and 100 fables as memory, this time using as probes the Hamlet and West Side Story encodings as 
probes, as represented by Thagard et.al. Given Hamlet as a probe, the question is whether the systems can 
retrieve a tragedy, or at least another Shakespeare play. Given West Side Story as a probe, the challenge is 
more specific: To retrieve Romeo & Juliet, the analogous play. 

Table 13 shows the results for plays only in memory, and Table 14 shows the results with both plays and 
fables in memory. The good news for ARCS is that the fables have only minimally intruded on the 
activation for the top ranked retreived plays. A Midsummer Night's dream is ARCS' top-ranked retrieval 
for West Side Story, but it did also, as stated by Thagard et al, retrieve Romeo & Juliet. 
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ARCS results 



Probe 


Results 


Seconds 


Hamlet 


Romeo & Juliet (0.54), King Lear (0.53), Othello 
(0.46), 

Cymbeline (0.42), Macbeth (0.41), Julius Caesar (0.38) 


1843 


West Side Story 


Midsummer Night's Dream (0.58), Romeo & Juliet 
(0.57) 


2539 



MACTFAC results 



Probe 


Results 


Seconds 


Hamlet 


FAC: Romeo & Juliet (6.79) 

MAC: Othello (0.86), Macbeth (0.85), Romeo & Juliet (0.83), 
Julius Caeser (0.81) 


22 


West Side 
Story 


FAC: Romeo & Juliet (16.51) 
MAC: Romeo & Juliet (0.88) 


13 



Table 13:Result$ for Hamlet, West Side Story as probes, Plays database. 

MAC/FAC, on the other hand, only retrieves Romeo & Juliet with either probe. For West Side Story this 
is indeed the expected result (and we believe more intuitive than ARCS' result), but what is happening with 
Hamlet? Examining the structural evaluation scores (e.g., the FAC scores) reveals that FAC considers the 
match between West Side Story and Romeo & Juliet to be excellent (16.51), which makes sense because 
the encodings of West Side Story and Romeo & Juliet have almost isomorphic structure. When Hamlet is 
the probe , FAC is relatively indifferent, the FAC scores were: Romeo & Juliet (6.79), Julius Caesar 
(5.49), Macbeth (3.72), Othello (2.67). The dropoff from Romeo & Juliet is 20%, which is below than 
MAC/FAC*s default cutoff of 10%. 



ARCS Results 



Probe 


Results 


Seconds 


Hamlet 


Romeo & Juliet (0.531), King Lear (0.528), Othello 
(0.45), 

Cymbeline (0.41), Macbeth (0.40), Julius Caesar (0.37) 


4112 


West Side Story 


Midsummer Night's Dream (0.58), Romeo & Juliet (0.57) 


5133 


MAC7FAC Results 


Probe 


Results 


Seconds 


Hamlet 


FAC: Romeo & Juliet (6.79) 

MAC: Othello (0.86), Macbeth (0.85), Romeo & Juliet 
(0.83), 

Julius Caesar (0.81), Fable52 (0.80) 


26 


West Side Story 


FAC: Romeo & Juliet (16.51) 
MAC: Romeo & Juliet (0.88) 


8 



Table 14:Results for Hamlet, West Side Story as probes, Plays + Fables database. 
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6.6 Experiment 4: Hawk stories 

The goal of encoding the Hawk stories was to replicate the results of Karla the Hawk studies described in 
Section 2.1.1. Thagard et.al. encoded one story set, and using the relative activation levels of the stories 
computed by ARCS as relative retrieval probabilities for human subjects. As Section 4.4 pointed out, 
ARCS' order of retrieval was: literal similarity, first-order overlap, appearance, analogy, which is not a 
close match to the observed human ordering of literal similarity, appearance, analogy, first-order overlap. 
By contrast, MAC/FAC matched the human ordinal results in our simulation of this experiment. 

However, our purpose in this experiment is to pursue the question of stability of results under different 
distractors. We ask two questions: (1) does MAC/FAC, using Thagard et.al' s encodings, perform 
appropriately, and (2) does changing the database used as ARCS' memory change its predicted outcomes? 
Both simulations were run with the Hawk stories as probes, and with the fables (plus Karla story) as 
memory and with both fables and plays (plus the Karla story) as memory. The results are shown in Table 
15 and Table 16 respectively. 



ARCS Results 



Probe 


Results 


Seconds 


Karl, literal similarity 


"Karla" base (0.67) 


315 


Karla appearance 


Fable55 (0.4), [7 fables], "Karla" base (-0.17) 


176 


Karla, analogy 


Fable23 (0.33), [7 fables], "Karla" base (-0.27) 


127 


Karla, first-order overlap 


Fable23 (0.0907), Fable55 (0.0903), [13 fables], "Karla" 
base (-0.11) 


17 


MAC/FAC Results. 


Probe 


Results 


Seconds 


Karla, Literal Similarity 


FAC: "Karla" (16.07) 

MAC: "Karla" (0.81), Fable71 (0.74) 


6 


Karla, apperance 


FAC: "Karla" (7.92) 

MAC: "Karla" (0.71), Fable52 (0.71), Fable7 1(0.66), 
Fable27(0.65), Fable5(0.64) 


7 


Karla, analog 


FAC: "Karla" (8.57) 

MAC: "Karla"(0.81), Fable52 (0.77), FableS (0.77), 
Fable7 1(0.76), Fable45(0.75), Fable59(0.75), 
Fable27(0.75) 


14 


Karla, First-order overlap 


FAC: "Karla" (5.33), Fable5 (5.33) 

MAC: "Karla" (0.73), Fable7 1(0.71), Fable52(0.71), 

Fable5(0.71), Fable45(0.69), 

Fable59(0.68),Fable27(0.68) 


7 



Table 15: Results for HAWK probes, database = Fables + "Karla** base story 



No matter which database is used, MAC/FAC always retrieves the Karla story, irrespective of which 
variant story is used as a probe. The MAC scores explain why: In each case the Karla story is at the top of 
the ranking, indicating that the pattern of identical predicates overlapping is greater for Karla and variant 
than for any other story. The fact that the Karla base story is retrieved for the literal similarity and 
appearance variants is expected. Its retrieval when the analogy is used as a probe is also reasonable 
(although if ARCS always retrieved analogs successfully it would be an implausible model). Retrieving the 
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base story when the first-order overlap story is used as a probe is not so reasonable. We believe this occurs 
because the Thagard et.al. representations are rather sparse and include almost no surface infonnation, and 
thus are less natural than might be desired (c.f. the specificity conjecture of Forbus & Gentner, 1989). 



ARCS Results 



Probe 


Results 


Seconds 


"Karla", literal 
similarity 


"Karla" base (0.67) 


614 


"Karla", appearance 


Fable55 (0.40),[16 stories], "Karla" base (- 
0.018) 


408 


"Karla", analogy 


Pericles (0.60), [17 stories], "Karla" base (-0.32) 


244 


"Karla", false analogy 


Pericles (0.58), [22 stories], "Karla" base (-0.38) 
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MAC/FAC Results 



Probe 


Results 


Seconds 


Karla, Literal 

OUiUlUXllj 


FAC: "Karla"( 16.07) 

MAC- "Karla'YO Sl'i Fable? 1 (014) 


7 


Karla, apperance 


FAC: "Karla" (7.92), 

MAC: "Karla" (0.71), Fable52(0.71), Julius Caesar 
(0.69), 

Othello (0.68), Macbeth (0.67), Fable7 1(0.66), 
Two Gentlemen of Verona (0.65), Fable27(0.65), 
Hamlet (0.65), Fable5(0.64) 


21 


Karla, analog 


FAC:"Karla"(8.57) 

MAC: "Karla" (0.81), Julius Caesar (0.78), 
Two Gentlemen of Verona (0.78), Fable52(0.77), 
Fable5(0.77), Macbeth (0.76),As You Like It(0.76), 
Fable7 1(0.76), Fable45(0.75), Fable59(0 "^5), 
Fable27(0.75), Othello(0.75) 


37 


Karla, First-order 
overlap 


FAC: "Karla"(5.33), Fable5(5.33) 
MAC: "Karla"(0.73), Juilius Caesar(0.72), 
Two Gentlemen of Verona (0.72), Fable7 1(0.71), 
Fable52(0.71), Fable5 (0.71), Macbeth(0.70), 
As You Like It (0.70), Othello (0.69), Fable45 
1 (0.69), Hamlet(0.68) 


23 



Table 16: Results for HAWK probes, with database = Fables + Plays + "Karla" base story 

Interestingly, this experinient marks the only place where the decision to use functions in encoding made 
any real difference in the results. If no functions were used in translating the ARCS representations, the 
MAC results remained the same (because content vectors are based strictly on identical predicates), but the 
Karla base story would be knocked out of the FAC output by other stories that had more overlapping 
structure, since the causal structure in the Karla story could not be consistently mapped due to non- 
identical relations. The fact that this problem only shows up with this one probe set, out of all the 
representations made by Thagard et.al., suggests that this is not a serious problem. 
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As was suggested by experiments 1 and 2, the ARCS results vary considerably with different distractor 
sets. This means that the use of relative activations to estimate relative frequencies is not a stable measure. 
Specifically, the relative ordering of first-order overlap and analogy reverses when the datbase of fables is 
augmented with the plays. The position of the Karla story in the activation rankings is also alarming. The 
appearance story, which should retrieve the base almost as often as the literal similarity story, has dropped 
from ninth in the ranking to 18th. Depending on where the retrieval cutoff is placed, the conclusion might 
be that ARCS fails to retrieve the Karla story given the very close surface match. 

6.7 Experiment 5: ARCS using identicality 

The results so far indicate that MAC/FAC is far more immune to false positives than ARCS. What is 
responsible for this difference? Is it MAC/FAC*s use of a separate stage that performs structural filtering? 
The use of content vectors versus parallel constraint satisfaction to generate an initial set of retrieval 
candidates? MAC/FAC's identicality constraint versus ARCS' weaker semantic similarity constraint? A 
complete answer to this question will require much more empirical and theoretical work, but we can gain 
some insight by a simple experiment. We ran ARCS again, but without the WordNet-inspired similarity 
network. Under such conditions, ARCS only creates local matches between identical predicates, and the 
initial candidate set is much smaller, because the semantic similarity constraint has been greatly tightened. 

The results of this experiment are shown in Tables 17-19. Table 17 shows that the results on Sour grapes 
have improved substantially; ARCS is no longer tempted by plays. Table 18 shows that, while a 
Midsummer Night's Dream is high on ARCS' list, it no longer prefers it to Romeo & Juliet when West 
Side Story is used as a probe. The HAWK results show the least improvement; the estimated retrieval 
order again does not match that of human subjects, and there are still many fables and plays ahead of what 
should be very close matches to the Karla base story. 



ARCS w/identicality, SG/FAB 


ILES 


Probe 


Results 


Seconds 


Sour Grapes literal 
similarity 


Sour Grapes(0.18) 


1.3 


Sour Grapes appearance 


Sour Grapes (0.28) 


23 


Sour Grapes analogy 


Sour Grapes (0.18) 


1.1 


ARCS w/identicality, SG/(FA 


BLES+PLAYS) 


Probe 


Results 


Seconds 


Sour Grapes literal 
similarity 


Sour Grapes (0.19) 


4 


Sour Grapes appearance 


Sour Grapes (0.28) 


34 


Sour Grapes analogy 


Sour Grapes (0.19) 


4 



Table 17: ARCS w/identicality on Sour Grapes with Fables and Fables + Plays 
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ARCS w/iden 


ticaiity, database = plays 


Probe 


Result 


Second 
s 


Hamlet 


King Lear (0.56), Romeo & Juliet (0.52), Othello (0.47), 
Cymbeline (0.41), Macbeth (0.40), Julius Caesar (0.38) 


489 


West Side 
Story 


Romeo & Juliet (0.59), Midsummer Night's Dream (0.52) 


1671 


ARCS wAden 


ticality, database = plays+Fables 


Probe 


Result 


Second 
s 


Hamlet 


King Lear (0.55), Romeo & Juliet (0.51), Othello (0.46), 
Cymbeline (0.49), Macbeth (0.39), Julius Caesar (0.37) 


1108 


West Side 
Story 


Romeo & Juliet (0.59), Midsummer Night's Dream (0.52) 


3014 



Table 18: ARCS w/identicality, probed with Plays 



ARCS w/identicality, HAWK/FABLES 



Probe 


Results 


Seconds 


Karla, Literal Similarity 


Fable23(0.261),Fable55(0.258),Karla story (-0.1) 


73 


Karla, Appearance 


Fable55(0.4),[8 fables],Karla story (-0.23) 


114 


Karla, True Analogy 


Fable23(0.26),Fable55(0.26),[5 fables], Karla story 
(-0.23) 


12 


Karla, First-Order 
overlap 


Fable23(0.087),Fable55(0.087) 


5 


ARCS w/identicalitv, HAWK/(FABLES+PLAYS) ^ 


Probe 


Results 


Seconds 


Karla, Literal 
Similarity 


Fable23(0.26),Fable55(0.26),Karla story(-0.014) 


74 


Karla, Appearance 


Fable55(0.25),Hamlet(0.17),Fable23(0.067),[17 plays & 
fables],Karla story(-0.22) 


154 


Karla, True 

Analogy 


Pericles(0.55),[3 plays],Fable23(-0.13),Fable55(-0.13), 
[8 plays & fables],Karla story (-0.30) 


29 


Karla, First-Order 
overlap 


Pericles(0.59),[6 fables & plays],Fable23(-0.25), 
Fable55(-0.25) 


18 



Table 19: ARCS w/identicality, HAWK probes 



6.8 Conclusions from Computational Comparison Experiments 

The results of cognitive simulation experiments must always be interpreted with care. In this case, we 
believe our experiments provide evidence that MACTFAC, using structure-mapping's identicality 
constraint, better models retrieval than ARCS, which uses Thagard et al.'s notion of semantic similarity. 
In retrieval, the special demands of large memories argue for simpler algorithms, simply because the cost of 
false positives is much higher. If retrieval were a one-shot, all-or-nothing operation, the cost of false 
negatives would be higher. But that is not the case. In normal situations, retrieval is an iterative process, 
interleaved with the construction of the representations being used. Thus the cost of false negatives is 
reduced by the chance that reformulation of the probe, due to rerepresentation and inference, will 
subsequently catch a relevant memory that slipped by once. 

While both are designed to allow parallel implementations, the two orders of magnitude speed difference 
clearly suggests that MAC/FAC is the more practical choice for cognitive simulation experiments. 

Finally, we note that while ARCS' use of a localist connectionist network to implement constraint 
satisfaction is in many ways intuitively appealing, it is by no means clear that such implementations a.-e 
neurally plausible. On the other hand, we believe the evidence suggests that MAC/FAC captures 
similarity-based retrieval phenomena better than ARCS does. 

7. Discussion 

Similarity is central in transfer. To understand its role requires making fine distinctions both about 
similarity and about transfer. The psychological evidence indicates that the accessibility of matches from 
memory is strongly influenced by surface conimonalities and weakly influenced by structural 
commonalities, while the rated inferential soundness of comparisons is strongly influenced by scuctural 
commonalities and little, if at all, influenced by surface commonalities. An account of similarity in transfer 
must deal with this disassociation between retrieval and structural alignment: between the matches people 
get from memory and the matches they want 

The MAC/FAC model of similarity-based retrieval captures both the fact that humans successfully store 
and retrieve intricate relational structures and the fact that access to these stored structures is heavily 
(though not entirely) surface-driven. The first stage is attentive to content and blind to structure and the 
second stage is attentive to both content and structure. The MAC stage uses content vectors, a novel 
summary of structured representations, to provide an inexpensive "wide net" search of memory, whose 
results are pruned by the more expensive literal similarity matcher of the FAC stage to arrive at useful, 
structurally sound matches. 

The simulation results presented here demonstrate that MAC/FAC can simulate the patterns of access 
exhibited by human subjects. It displays the appropriate preponderance of literal similarity and surface 
matches, and it occasionally retrieves purely relational matches (Section 4). Our sensitivity studies suggest 
that these results are a consequence of our theory, and are not hostage to non-theoretically motivated 
parameters or algorithmic choices (Section 5). Our computational experiments comparing MAC/FAC and 
ARCS (Section 6) suggests that MAC/FAC both accounts for the psychological results more accurately 
and more robustly than ARCS. In addition to the experiments reported here, we have tested MAC/FAC on 
a variety of other data sets, including relational metaphors (30 descriptions, average of 12 propositions 
each) and attribute-rich descriptions of physical situations as might be found in commonsense reasoning 
(12 descriptions, averaging 42 propositions each). We have also tried various combinations of these 
databases with the Karla the Hawk data set (45 descriptions, averaging 67 propositions each). In all cases 
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to date MAC/FAC's performance has been satisfactory and consistent with the overall pattern of findings 
regarding human retrieval. 

However, there are still many open issues regarding access and MAC/FAC. The rest of this section 
examines the most important of these, compares MAC/FAC to other directly relevant models, and explores 
some broader implications. 

7. 1 Limitations and open questions 

7.1.1 Retrieval failure 

Sometimes a probe reminds us of nothing. Currendy the only way this can happen in the MAC/FAC model 
is for FAC to reject every candidate provided by MAC. This can happen if no structurally sound match 
hypotheses can be generated between the probe and the descriptions output by MAC. (Without any local 
correspondences there can be no interpretation of the comparison.) This can happen, albeit rarely. A 
variant of MAC/FAC with thresholds on the output of e'^ther or the both MAC and FAC stages - so that 
the system would return nothing if the best match were below criterion - would show more non- 
remindings. 

7.1.2 Focused remindings and penetrability 

Many AI retrieval programs and cognitive simulations elevate the reasoner's current goals to a central role 
in their theoretical accounts (e.g., Hammond, 1986, 1989; Keane, 1988ab; Kolodner, 1984, 1989; Riesbeck 
& Schank, 1989; Thagard, Holyoak, Nelson, & Gochfeld, 1990). Although we agree with the claim that 
goal structures are important, MAC/FAC does not give goals a separate status in retrieval. Rather, we 
assume that the person's current goals are represented as part of the higher-order structure of the probe. 
The assumption is that goals are embedded in a relational structure linking them to the rest of the situation; 
they play a role in retrieval, but the rest of the situational factors must participate as well. When one is 
hungry, for instance, presumably the ways of getting food that come to mind are different if one is standing 
in a restaurant, a supermarket, or in the middle of a forest. The inclusion of current goals as part of the 
representation of the probe is consistent with the finding of Read and Cesa (1991) that asking subjects for 
explanations of current scenaiios leads to a relatively high rate of analogical reminding. However, we see 
no reason to elevate goals above other kinds of higher-order structure. By treating goals as just one of 
many kinds of higher-order structures, we escape making the erroneous prediction of many case-based 
reasoning systems: that retrieval requires common goals. People can retrieve information that was 
originally stored under different goal structures. (See Goldstein, Kedar, & Bareiss, 1993, for a discussion 
of this point.) 

A related question concerns the degree to which the results of each stage are inspectable and tunable. We 
assume that the results of the FAC stage are inspectable, but that explicit awareness of the results of the 
MAC stage is lacking. We conjecture that one can get a sense that there are possible matches in the MAC 
output, and perhaps some impression of how strong the matches are, but not what those items are. The 
feeling of being reminded without being able to remember the actual item might correspond to having 
candidates generated by MAC which are all either too weak to pass on or are rejected by the FAC stage. 

How much can MAC and FAC be affected by the subject? There is psychological evidence that people 
cannot directly control the kinds of matches they retrieve. Schumacher and Centner (in preparation) 
investigated this by varying the test instructions given to subjects. They gave subjects lists of proverbs to 

44 

o 43 
ERLC 



read, followed by test proverbs that were either structurally similar or surface-similar to proverbs studied 
previously. Subjects who were told to write any prior proverbs that they were reminded of while reading 
the test proverbs recalled about twice as many surface matches as analogies. Another group of subjects 
was told to write only structural remindings and to strive for as many of these as possible. Although these 
subjects indeed wrote many fewer surface matches than the first group, they recalled only the same low 
number of analogies. The goal to seek relational matches apparently led people to filter non-relational 
matches, but not to find more relational matches. This suggests that the FAC matcher may be tunable (in 
that subjects were able to filter out the surface matches) but not the MAC matcher (in that subjects were 
not able to produce more analogies on demand). 

The idea that FAC, though not MAC, is tunable is consistent with evidence that people can be selective in 
similarity me^.ching once both members of a pair are present. For example, in a triads task, matching XX 
to 00 or XO, subjects can readily choose either only relational matches (XX — 00) or only surface 
matches (XX - XO) (Gentner & Markman, 1994a,b; Goldstone, Medin & Centner, 1991; Medin, 
Goldstone & Gentner, 1993). This could be modeled by assuming either that the FAC matcher is tuned 
according to such task constraints, or that there is a separate tunable similarity processor for comparing 
items within working memory. 



7,1.3 Size of content vectors 

One potential problem with scaling up MAC/FAC is the potential growth in the size of content vectors. 
Our current descriptions use a vocabulary of only a few hundred distinct predicates. We implement content 
vectors via sparse encoding techniques, analogous to those used in computational matrix algebra, for 
efficiency. However, a psychologically plausible representation vocabulary may have hundreds of 
thousands of predicates. It is not obvious that our sparse encoding techniques will suffice for vocabularies 
that large, nor does this implementation address the question of how systems with limited "hardware 
bandwidth", such as connectionist implementations, could serve as a substrate for this model. 

This scale problem is mitigated partly by MAC/FAC's basic architecture with its cheap initial filter. 
However there are at least two further possible ways to address the potential scale problem in the size of 
content vectors. The first is abstraction. In symbolic knowledge representations, predicates and functions 
are often arranged in hierarchies. For example, a complex concept such as bequeath might be stored as a 
specialization of the concept of giving, which might in tum be a specialization of the concept of transfer. 
Let us view the set of specializations between predicates as a lattice. Any set of predicates that partitions 
the lattice can be used to formulate a semantically compressed content vector as follows: The weight of a 
component of the compressed content vector is a function of the number of occurrences of that predicate - 
and all predicates below it in the partition - in the description. In effect, predicates below the selected 
subsets are replaced with more abstract versions. Another possible solution for the scale problem is 
factorization: the predicates could be partitioned into subsets that are tightly interrelated and separate 
content vectors computed for each subset. This organization presumes that there is some fixed size bound 
on processing modules, but that several processing modules can be synchronized well enough to 
accumulate results across them. 



7.14 Combining similarity effects across items 

MAC/FAC is currently a purely exemplar-based memory system. The memory items can be highly 
situation-specific encodings of perceptual stimuli, abstract mathematical descriptions, causal scenarios, etc, 
MAC/FAC lacks the capacity to inter-item effects. For example, MAC/FAC does not capture competition 
among items. Wharton, Holyoak, Downing, Lange, and Wickens (1991, 1992) and Wharton, Holyoak, 
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Downing, Lange, Wickens, and Melz (1994) have shown an intriguing effect where competition between 
exemplars heightens the relative effect of structural similarity in retrieval. MAC/FAC also does not 
average across several items at retrieval (Medin & Schaffer, 1978) or derive a global sense of familiarity 
by combining the activations of multiple retrievals (Gillund Shiffrin, 1984; Hintzman, 1986, 1988), An 
interesting extension of MAC/FAC would be to include this kind of between-item processing upon 
retrieval. 

If such inter-item averaging occurs, it could provide a route to the incremental construction of abstractions 
and indexing information in memory. We see three plausible ways to do this. First, as above, the 
descriptions output by the MAC stage could be compared. Second, the access system might incrementally 
build up something like Minsky's (1981) similarity network, using the history of retrievals to encode 
difference descriptions to simplify future access. Third, the descriptions output by the FAC stage could be 
compared: SME could be used to carry out structural abstraction across several descriptions (as in 
Skorstad, Medin, & Gentner, 1988) to produce a combined description as the FAC output. The first and 
third models are both forms of "late averaging" accounts, and it would be interesting to compare these 
techniques with other models that account for prototype effects by combining exemplars at retrieval 
(Hintzman, 1986, 1988; Medin & Shaffer, 1978). 

7.1.5 Iterative access 

Keane (1988c, 1991; Keane & Brayshaw, 1988) and Burstein (1983a,b) have proposed incremental 
mapping processes. We suggest that similarity-based retrieval may also be an iterative process. In 
particular, in active retrieval (as opposed to spontaneous remindings) we conjecture that MAC/FAC may 
be used iteratively, each time modifying the probe in response to the previous match (c.f. Falkenhainer 
1987, 1990; Gentner 1989). Suppose for example a probe yielded several partial remindings. The system 
of matches could provide clues as to which aspects of the probe are more or less relevant, and thus should 
be highlighted or suppressed on the next iteration. MAC should respond to this altered vector by returning 
more relevant items, and FAC can then select the best of these. 

Another advantage of such incremental reminding is that it might help explain how we derive new relational 
categories. Barsalou's (1982, 1987) ad hoc categories, such as "things to take on a picnic" and Glucksberg 
and Keysar's (1990) metaphorically based categories, such as "jail" as a prototypical confining institution, 
are examples of the kinds of abstract relational commonalities that might be highlighted during a process of 
incremental retrieval and mapping. 

7.1.6 Embedding in performance-oriented models 

MAC/FAC is not itself a complete analogical processing system. For example, both constructing a model 
from multiple analogs (e.g., Burstein, 1983a,b) and learning a domain theory by analogy (e.g., 
Falkenhainer 1987, 1988, 1990b) require multiple iterations of accessing, mapping, and evaluating 
descriptions. Several psychological questions about access cannot be studied without embedding 
MAC/FAC in a more comprehensive model of analogical processing. First, as discussed above, there is 
ample evidence that subjects can choose to focus on different kinds of similarity when the items being 
compared are both already in working memory. Embedding MAC/FAC in a larger system should help 
make clear whether this penetrability should be modeled as applying to the FAC system or to a separate 
similarity engine. (Order effects in analogical problem solving (Keane, in press) suggest the latter.) 

A second issue that requires a larger, performance-oriented model to explore via simulation is when and 
how pragmatic constraints should be incorporated (c.f. Holyoak & Thagard, 1989; Thagard et al, 1990). 
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Since we assume that goals, plans, and similar control knowledge is explicitly represented in working 
memory, the MAC stage will include such predicates in the content vector for the probe, and hence will be 
influenced by pragmatic concerns. There are two ways to model the effects of pragmatics on the FAC 
stage. The first is to use the SME pragmatic marking algorithm (Forbus and Oblinger, 1990) as a 
relevance filter. The second is to use incremental mapping, as in Keane & Brayshaw's (1988) Incremental 
Analogy Machine (lAM). This technique permits the selection and grouping of sets of correspondences to 
be influenced by the task at hand (Forbus, Ferguson, & Gentner, 1994). 

A recent simulation by Lange and Wharton (1992, 1993) called REMIND models retrieval in the context of 
natural language processing, using spreading activation in a connectionist network both to construct a 
conceptual representation from textual input and to find the most similar story in its episodic memory. 
REMIND is an intriguing model, and the attempt to integrate multiple cognitive processes into larger 
models is an important activity. However, it is difficult to compare this model with MAC/FAC and other 
retrieval models. First, REMIND only models a specific retrieval task, namely retrieval in the service of 
understanding stories, and thus does not attempt to cover as wide a span of phenomena as MAC/FAC. 
Second, when REMIND retrieves a story, it does not appear to create correspondences between the 
understanding of its input and the previous story, nor does it generate novel candidate inferences, and thus 
does not satisfy the structured mappings criterion for retrieval. Third, REMIND has only been tested on a 
corpus involving a r undful of short (i.e., two sentence) stories. To our knowledge, it has never been tested 
either on a corpus as large as those used with MAC/FAC and ARCS, nor on a corpus that includes 
examples as large as those used with MAC/FAC. Even their current small databases stretch the limits of a 
Connection Machine, which makes it difficult to evaluate their model thoroughly. 

7.1.7 Expertise and Relational Access 

Despite the gloomy picture painted in the present research and in most of the problem-solving research, 
there is evidence of considerable relational access (a) for experts in a domain and (b) when initial encoding 
of the study set is relatively intensive. Novick (1988a,b) studied remindings for mathematics problems 
using novice and expert mathematics students. She found that experts were more likely than novices to 
retrieve a structurally similar prior problem, and when they did retrieve a surface-similar problem, they 
were quicker to reject it than were novices. Faries and Reiser (1988) taught subjects LISP in a series of 
intensive training sessions and then gave them target problems that were superficially similar to one prior 
problem and structurally similar to another. Given this intensive training, Faries and Reiser's subjects were 
able to access structurally similar problems despite the competing superficial similarities. 

The second contributor to relational retrieval, almost certainly related to the first, is intensive encoding. 
Gick and Holy oak (1983) and Catrambone and Holy oak (1987, 1989) found that subjects exhibited 
increased relational retrieval when they were required to compare two prior analogs, but not when they 
were simply given two prior analogs. Schumacher & Gentner (1987) found increased relational retrieval of 
proverbs when subjects wrote out the meaning of each proverb on the study list, as opposed to simply 
reading it or rating its cleverness. Seifert, McKoon, Abelson, and Ratcliff (1986) investigated priming 
effects in a sentence verification task between thematically similar (analogical) stories. They obtained 
priming when subjects first studied a list of themes and then judged the thematic similarity of pairs of 
stories, but not when subjects simply read the stories. 

The increase of relational reminding with expertise and with intensive encoding can be accommodated in 
the MAC/FAC model. First, we assume that experts have richer and better structured representations of the 
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relations in the content domain than do novices (Carey, 1985; Chi, 1978; Reed, Ackinclose, & Voss, 1990). 
This fits with developmental evidence that as children come to notice and encode higher-order relations 
such as symmetry and monotonicity, their appreciation of abstract similarity increases (Kotovsky & 
Centner, 1990; Centner & Rattermann, 1991). Second, in particular we speculate that experts may have a 
more uniform internal relational vocabulary within the domain of expertise than do novices (Clement, 
Mawby & Ciles, 1994; Centner & Rattermann, 1991; Centner, Rattermann, Markman, & Kotovsky, in 
press). The idea is that experts tend to have relatively comprehensive theories in a domain and that this 
promotes canonical relational encodings within the domain. 

To the extent that a given higher-order relational pattern is used to encode a given situation, it will of 
course be automatically incorporated into MAC/FAC's content vector. This means that any higher-order 
relational concept that is widely used in a domain will tend to increase the uniformity of the representations 
in memory. This increased uniformity should increase the mutual accessibility of situations within the 
domain. Thus as experts come to encode a domain according to a uniform set of principles, the likelihood 
of appropriate relational remindings increases. That is, under the MAC/FAC model, the differences in 
retrieval patterns for novices and experts are explained in terms of differences in knowledge, rather than by 
the construction of explicit indices. 

Bassok has made an interesting argument that indirectly supports this claim of greater relational uniformity 
for experts than for novices (Bassok & Wu, in preparation). Noting prior findings that in forming 
representations of novel texts people's interpretations of verbs depend on the nouns attached to them 
(Centner, 1981; Centner & France, 1988), Bassok suggests that particular representations of the relational 
structure may thus be idiosyncratically related to the surface content, and that this is one contributor to the 
poor relational access. If this is true, and if we are correct in our supposition that experts tend to have a 
relatively uniform relational vocabulary, then an advantage for experts in relational access would be 
predicted. 

As domain expertise increases, MAC/FAC's activity may come to resemble a multi-goal case-based 
reasoning model. We can think of its content vectors as indices with the property that they change 
automatically with any change in the representation of domain exemplars. Thus as domain knowledge - 
particularly the higher-order relational vocabulary - increases, MAC/FAC may come to have sufficiently 
elaborated representations to permit a fairly high proportion of relational remindings. The case-based 
reasoning emphasis on retrieving prior examples and generalizations that are inferentially useful may thus 
be a reasonable approximation to the way experts retrieve knowledge. 



7.1.8 Comparison with recent approaches 



Retrieval models tend to be of two types: Cognitive simulations that are committed to modeling human 
performance in detail, and case-based reasoning systems that, while inspired by human cognition, are 
equally concerned with effective performance on practical problems. Of recent cognitive simulations, the 
closest is the REMIND model of Lange and Wharton (1992, 1993), discussed in Section 7.1.6. Of the 
case-based reasoning systems, one of closest to our approach seems to be the CaPER system (Kettler, 
Hendler, and Anderson, 1992). CaPER is designed to retrieve all sufficiently similar plans from pn 
unindexed case-base, using massive parallelism to do a simple, non-stmctural match between a query and 
the contents of mem »ry. It would be very interesting how well the parallel techniques used in CaPER could 
be applied to MAC/FAC. 




7.2 The Decomposition of Similarity 

The dissociation between surface similarity and structural similarity across different processes has broader 
implications for co^^Uion, and is related, to several recent discussions. Medin, Goldstone and Gentner 
(1993) and Gentner :^»89) have argued that similarity is pluralistic, in the sense that there are multiple 
subclasses of similanty and multiple influences on how it is computed. Rips (1989) demonstrated a 
dissociation between similarity, typicality, and categorization. Murphy and Medin (1985) and Keil (1989) 
have commented on the limited usefulness of simple similarity and pointed out that physical resemblance 
does not provide a sufficient basis for determining conceptual groupings. As discussed above, there is a 
relational shift in development (Gentner & Rattermann, 1991; Halford, 1992). Fin .JIy, local object matches 
appear to be processed faster by adults than structural commonalities. Goldstone and Medin (1994) found 
that local similarities have their effects on mapping earlier than global relational similarities in a timed 
mapping task, and Ratcliff and McKoon (1989) found convergent results in a sentence-matching task: 
subjects could discriminate new from old sentences faster if the new sentences contained all new words 
(e.g., "Helen attracted Jeff.", vs. "Andrew accosted Mary.") than if the sentences differed only in relational 
structure (e.g., "Helen attracted Jeff." vs. "Jeff attracted Helen."). In pilot experiments using perceptual 
stimuli, in which subjects were timed under different kinds of mapping instructions, Markman and Gentner 
(in press) have found that subjects are faster to choose on the basis of similar objects than on the basis of 
similar relations, even when the two rules dictate the same response. 

These kinds of results render less plausible the notion of a unitary similarity that governs retrieval, 
evaluation and inference. Instead, they suggest a more complex, pluralistic view of similarity. MAC/FAC 
provides an architecture which demonstrates how such a pluralistic notion of similarity can be organized to 
account for psychological data on retrieval. 
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